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Experimental Methods in Food Tasting* 


L. L. Thurstone 
The Psychometric Laboratory, The University of Chicago 


When I first learned about the food-tasting 
laboratory at the Quartermaster Corps here in 
Chicago, I was much interested to see it. I 
found in that laboratory five booths which were 
specially designed for experimental work in this 
area. I volunteered as a subject, and I was 
seated at a table next to a wall. On the other 
side of the wall were technicians who placed 
food specimens on a small revolving platform. 
I was presented with a number of samples of 
tomato juice. The instructions told me to 
rank the specimens according to my prefer- 
ences. These specimens had been stored for 
different lengths of time at different tempera- 
tures in different types of containers. The 
practical question was to determine whether 
people can taste the differences and whether 
they will reject some of these foods. It has 
been found that the human subject can make 
discriminations to accept or reject foods with 
criteria that are not evident in the chemical 
and bacteriological analyses. In developing 
new foods it is then essential to ascertain 
whether people will eat them. This is called 
food acceptance research. 

It is an interesting circumstance that the 
laboratory methods for determining sensory 
limens and hedonic values have here an im- 
portant practical application. This is the old- 
est part of experimental psychology, and it is 
called psychophysics. Traditionally, this field 
has been known among students of psychology 
as a theoretical and very dull subject, but it is 
now coming to be regarded as an interesting 
aspect of psychological method. 

One of the first problems of the food-tasting 


*This paper was read at a Research Conference 
sponsored by the Council on Research of the American 
Meat Institute, March 24, 1950, at the University of 
Chicago, Chicago, Illinois. 


laboratory is to arrange the procedures to ob- 
tain a maximum of information from each sub- 
ject in the minimum time. In analyzing a new 
food product by taste experiments it is desir- 
able to compare different specimens which 
have been prepared and kept under different 
conditions so that it will be known under what 
latitude the product remains not only edible 
but also desirable. 

One of the controls in the food-tasting booths 
is the color of the illumination. The examiner 
can instantly change the color of the food that 
is being rated. A piece of butter suddenly 
looks like a piece of lard, and acceptability 
may be affected. 

A theoretically more interesting problem is 
that of predicting from the ratings what pro- 
portion of the subjects will choose food A when 
it is offered in competition with foods B, C, and 
D, for example. It is not possible to experi- 
ment explicitly with every possible food com- 
bination. There are too mary. How should 
the ratings be made for predictions of this kind? 

Taste experiments are sometimes conducted 
with an especially selected panel of expert 
tasters. Such people are selected because they 
excel in the ability to make fine discriminations 
in taste and sometimes because of their famil- 
iarity with a particular product. A different 
policy is to select a group of tasters who repre- 
sent the public that is to be pleased rather than 
a small group of experts. If the taste experi- 
ments are conducted for the purpose of pre- 
dicting relative sale or consumption for com- 
peting flavors or brands, then it seems best to 
work with an experimental population that is 
truly representative of the public for whom the 
product is manufactured. The expert taster 
can be used to supplement laboratory tests. 
Before expert tasters are put to work, they 
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certainly should qualify as to their consistency 
in making taste discriminations. 

Most of the experimental work in psycho- 
physics has been done with vision and hearing. 
Such experiments are carried out most easily 
with visual stimuli where the subject can see 
two objects simultaneously. With auditory 
stimuli that can not be done. The stimuli 
must then be perceived in succession. This is 
also true with taste and smell. In conducting 
taste experiments one must deal with several 
difficulties that are specially annoying. When 
the subject has tasted one specimen, he cannot 
immediately turn to the next specimen. 
Preferably, he should rinse his mouth and sev- 
eral minutes should elapse before he tastes the 
. next sample. In the case of smell, experimen- 
tal work has the additional difficulty of adap- 
tation. For some odors continued exposure 
reduces the ability to discriminate. 

Because of these difficulties in taste and 
smell, it is essential to arrange the procedure 
so that the maximum information is obtained 
with the smallest number of judgments. 

One procedure is to give the subject, say 
three, four, or five samples and to ask him to 
arrange them in rank order. If many speci- 


mens are to be tasted, they can be divided into 


several groups. For each group the subject 
is asked to arrange the specimens in rank order. 

The best procedure is probably to have a 
fairly large number-of subjects and to ask each 
person to taste each specimen only once. We 
might consider two variations of this’ proce- 
dure. The subject may be asked to allocate 
the specimens toa set of, say, ten steps. These 
ten steps would really represent intervals on a 
subjective scale of preference. One method is 
called the method of equal-appearing intervals. 
In this procedure the subject is asked to think 
of the ten steps as representing subjectively 
equal increments in preference from the worst 
to the best. 

The method of successive intervals is prob- 
ably the most suitable for this problem. In 
this method the subject samples each specimen 
only once, and he states his degree of preference 
in terms of one of a number of short descriptive 
phrases which are assigned to the successive 
intervals. There is no assumption that these 
intervals are in any sense equal. The descrip- 
tive words or phrases must, however, be suffi- 
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ciently distinct so that the subject accepts the 
rank order in which they are presented. In 
effect then, the subject expresses his degree of 
preference or dislike in terms of eight or ten 
descriptive phrases. This procedure probably 
gives the maximum amount of information in 
the shortest time. 

Each specimen is identified by a code num- 
ber or suitable name. On the record sheet the 
subject has a row of eight or ten spaces. Im- 
mediately under these spaces he finds descrip- 
tive phrases from extreme dislike, some dislike, 
indifference, some preference, to strong prefer- 
ence. When he has tasted a sample, he makes 
a check mark to indicate his degree of prefer- 
ence. This procedure is very coarse compared 
with the discrimination he could give if his 
attention were limited to only three or four 
specimens. Experimental work with other 
types of stimuli has indicated that this simple 
method with only one judgment for each speci- 
men for each subject gives results that are 
useful in prediction. It is assumed that sev- 
eral hundred subjects are available. They 
should be a random sample of the population 
for which the products are intended. The 
subjects find it easier to make judgments of 
this kind than to make the more refined paired 
comparison judgments that are called for when 
the purpose is, to determine the taste limens of 
the individual subjects. 

This is not the occasion to discuss the psy- 
chological scaling problem except to indicate 
the genera) nature of that problem. in psy- 
chological measurement we differentiate be- 
tween two types of scales, the objective physi- 
cal scale and the subjective or psychological 
scale. The physical or objective scale is the 
type that we all know. The unit of measure- 
ment is some physical unit such as the inch, 
the gram, or the liter. The subjective or psy- 
chological scale represents equal steps in ap- 
parent or perceived magnitude as contrasted 
with the physically measured magnitudes of 
the stimuli. Sometimes we refer to the physi- 
cal magnitudes as real magnitudes, but one 
could insist that the real magnitude is that 
which the subject actually experiences when 
he does make the differentiation. 

Let us consider a simple example to illus- 
trate the difference between the physical scale 
and the psychological scale. Suppose that we 
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place before the subject two standard weights 
which differ markedly. They might be 50 
grams and 500 grams. Now suppose that the 
subject has available a large number of inter- 
mediate weights and that we ask him to select 
one which seems to him to lie midway between 
the two standard weights. When he finds a 
weight that seems to him to lie midway be- 
tween the two given standard weights, we can 
determine the physical magnitude of this inter- 
mediate weight. It will then be found that 
the intermediate weight which seems to the 
subject to lie midway between the two stand- 
ards is physically closer to the lighter standard 
than to the heavier one. On the psychological 
scale the three weights are equally spaced be- 
cause the subject perceives the two intervals 
to be equal. On the physical scale, the three 
weights are not equally spaced. The lower 
interval is smaller than the upper one. In this 


case we expect to find that the psychological 
scale is a logarithmic function of the physical 
scale. 

When we deal with problems of discrimina- 
tion that are concerned with food, or indeed 
with accepting and rejecting other commodities, 
we are dealing with the apparent magnitudes, 


the subjective scale, because that is the scale 
that actually functions when we express pref- 
erences among several competing objects. 
One of the central problems in psychological 
measurement is to determine the ratio of the 
successive intervals. This problem has been 
solved so that we can speak of a subjective 
metric with a subjective unit of measurement 
and not merely about a set of rank orders. 
There is one concept in psychological meas- 
urement to which I should like to draw your 
special attention because it can have far-reach- 
ing effects in many practical problems. The 
concept is rarely understood by those who have 
the responsibility for making decisions in which 
this concept should play a part. I am refer- 
ring to the concept of discriminal dispersion, 
and I shall indicate some of its curious effects. 
Consider two objects which might as well be 
food products in the present context. Let 
these two objects have the same average 
popularity. Let us suppose that they differ in 
one important respect, namely, that object A 
has a wide range of values in the population so 
that some people are enthusiastic about it 
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whereas others have intense dislike for it. 
We say that such an object has a large dis- 
criminal dispersion. Let object B have a 
small dispersion. This means that nobody is 
very enthusiastic about it and no one has any 
violent dislike for it. It can easily be seen 
that two objects can differ in this way in the 
range of feeling that people have for or against 
them in spite of the fact that the two objects 
have the same average popularity. We should 
expect food products like roast beef to have a 
rather small discriminal dispersion whereas 
oysters would have a large dispersion. 

Let us suppose that we have six objects of 
equal average popularity. Let the object A 
have large discriminal dispersion, and let the 
other five objects have vanishingly small dis- 
persions so that people are fairly well agreed 
about them. If these six objects are presented 
to the population and if each person chooses 
one of the objects, then the object A gets one- 
half of the votes while the other five objects 
get only 10 per cent each. It is quite likely 
that some of the errors in the prediction of 
elections have been caused by the striking 
effects of differences in discriminal dispersions. 

e theory of this problem is very much the 
same whether we consider the popularity and 
dispersions of political candidates or of any 
other competing objects that are to be chosen. 

If we have a record of preference judgments 
from an experimental population about each 
one of a large number of foods which might be 
listed merely by name, then it should be possi- 
ble to predict the proportion of the population 
that will select each item as its first choice when 
it is presented with any given set of alternatives. 
For example, if any arbitrary list of four, five, 
or six of these foods is presented for choice, it 
should be possible to predict the proportion of 
the subjects who will select each of those foods. 
This kind of prediction has been made on other 
types of material, and there should be no rea- 
son why the same principle should not be ex- 
tended to the prediction of choice in foods. 
Here also the concept of discriminal dispersion 
plays an important role which market research 
men could make use of if they knew about it. 

If we have the problem of presenting, say, 
five objects out of a total list of 50 or 100 ob- 
jects with the purpose of maximizing the num- 
ber of individuals who will select one of the 
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five, then there is a fairly definite analysis that 
should be made. The first impulse is usually 
to select the five most popular objects from the 
whole list and to present these five, but that is 
the wrong answer. Although this is the 
wrong answer, I suspect that such a solution is 
often attempted. The number of choices from 
a population depends on the dispersions of 
hedonic values and also on the correlations 
among these items. It usually happens that 
those who prefer item A are likely to prefer 
some other item, B, and that those who prefer 
item C dislike item E and so on. Merely to 
select the five most popular items from the 
total available list usually means that many 
persons will find several items that please them 
so that they have difficulty in making up their 
minds, while others will find nothing that 
pleases them in the presented list of five objects. 
A better solution can be described briefly. 
We start assembling the required list of five 
objects by slecting the most popular single 
item A from the available set. We remove 
from the experimental population all those 
who did choose A. Now we ascertain the 
most popular object B in the remainder of the 
experimental population and B is then added 
to the collection. Again we remove from the 
survey population all who did select B. Now 
we ascertain the most popular object C in the 
remainder of the survey population. After 
removing all who did select object C, we might 
discover that most of the survey population 
has been accounted for. If this should happen, 
then we would have discovered not only which 
objects to present so as to maximize accept- 
ance, but also the number of objects that are 
necessary to attain the required degree of ac- 
ceptance which might be, say, 90 per cent of 
the population. The number of rejections 
will be smaller with this solution than if we 
presented the five most popular objects from 
the whole available supply. I am venturing 
to guess that this principle has not been ap- 
plied in dealing with psychophysical problems 
concerned with food, but I see no reason why 
it should not apply in this field. 

The traditional psychophysical problems 
have been concerned with the establishment of 
the subjective scale, but we have considered 
here briefly an obverse problem, namely, to 
predict from such a scale what people will do. 
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This is the problem of prediction of choice. The 
problem that we have called the prediction of 
choice is that of predicting the proportion of 
the population that will select each particular 
competing object when a survey population 
has expressed its preferences about each of the 
separate objects. There is a related problem 
to which I have recently given considerable 
thought. Iam calling this problem the predic- 
tion of purchase, which is distinct from the pre- 
diction of choice. If all of the available ob- 
jects are of the same price, then the theorems 
about the prediction of choice can be applied 
over the whole range of objects. If the objects 
differ in price, we have, of course, different 
degrees of inhibition on free choice. The 
motivation to a purchase can be thought of as 
the algebraic sum of the hedonic value of the 
object and the negative hedonic value of the 
price. We are working on several psycho- 
physical theorems concerned with the predic- 
tion of purchase which is, of course, a much 
more difficult problem than the prediction of 
free choice among the available objects. In 
these psychophysical problems we are not con- 
cerned with the prediction of total volume be- 
cause that is for economists to worry about. 
Given, however, the expected total volume, it 
is largely a psychophysical problem to ascer- 
tain how the total volume is divided among 
competing objects. It seems quite likely that 
psychophysical theories will find useful applica- 
tion in the solution of important problems of 
this kind. I see no reason why economics 
should not be developed along these lines as an 
experimental science. 


Summary 


In this paper I have tried to describe briefly 
the nature of experimental work on taste and 
smell with special reference to judgments of 
preference. The conventional psychophysical 
methods which involve paired comparison 
seem to be less desirable in this field than the 
more general method of successive intervals 
even though this method does not claim to be 
so refined as the traditional methods. If our 
purpose is to ascertain the frequency distribu- 
tions of hedonic values of different foods, then 
it seems best to use as experimental subjects 
random samples from the population for whom 
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the foods are manufactured. It seems best 
also to obtain in a similar manner a single 
judgment from each subject about each one of 
a large number of foods, presented either by 
name or by actual specimens, than to confine 
attention to the work of a small group of expert 
tasters. In the inevitable practical problems 
of predicting consumption among competing 
food items, it should be profitable to take into 
consideration the discriminal dispersions as 
well as the inter-food correlations. These cor- 
relations can also be obtained from the data 
that are assembled by the method of successive 
intervals. The expert food taster still has an 
important function to supplement other labor- 
atory analyses. 

The practical problems can be classified in 
several groups. First is the problem of deter- 
mining the effect of various conditions of 
manufacture and storage and containers on 
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food acceptance. Another problem is the pre- 
diction of relative volume among different items 
as determined by experiments on food accept- 
ance. Another problem is that of combining 
food items so as to maximize acceptance 
among competing items which differ in dis- 
criminal dispersions. It seems likely that 
formal experimentation in this field can pro- 
duce more significant and useful results than 
simple frequency counts at the descriptive 
level. 


Received July 7, 1950. 


References 


1. Thurstone, L. L. The prediction of choice. 
chometrika, 1945, 10, 237-253. 

2. Thurstone, L. L. Psychophysical methods. In 
Andrews, T. G. (Ed.), Methods of psychology. 
New York: John Wiley and Sons, 1948, pp. 124- 
158. 


Psy- 





y 
; 
: 
{ 
: 


Measurement of Supervisory Ability* 


Omer R. Jones and Karl U. Smith 


University of Wisconsin 


A systematic approach to the analysis of 
supervisory and executive work indicates that 
such tasks generally involve a fourfold division 
of performance as follows: (1) knowledge of 
jobs supervised, (2) knowledge of technological 
controls applied to the jobs, (3) executive 
operations in planning and making decisions 
in the application of management practices to 
the work situation, and (4) leadership per- 
formance in dealing with the workers super- 
vised. For purposes of this study, we have 
designated factors 3 and 4 by the terms “ex- 
ecutive performance” and “leadership per- 
formance” respectively. Typically, these two 
aspects of supervisory work may be thought of 
as human relations performances covering 
most types of supervision in industry, whereas 
the first two factors are usually very specific 
to the work of a given department in an organ- 
ization. 

The complexity of the related performances 
in supervisory work, which consist of system- 
atic links with various parts of an organization, 
dictates that criterion measures of such work 
be complex in nature. In accordance with 
this fact, an attempt has been made in this 
study to investigate two different criteria of 
performance in supervisory work, and to de- 
velop capacity scales which may have signifi- 
cance in forecasting ability in supervisory 
tasks. The present paper describes data ob- 
tained in two different business organizations 
in which criterion measures of executive per- 
formance and leadership performance in super- 
vision were secured. 


* This research has been supported by the Research 
Committee of the Graduate School of the University of 
Wisconsin from funds provided by the State Legislature 
for 1949-50. The Supervisory Inventory can be made 
available on the basis of cooperative research with the 
Bureau of Industrial Psychology of the University of 
Wisconsin. The writers wish to thank Mr. B. W. 
Ellsom, Personnel Manager, Boston Store, Milwaukee, 
Wis., and Mr. Arnold Nielson, Personnel Manager, 
Milwaukee Electric Company, Milwaukee, Wis., for 
their ‘eae and direct aid in the conduct of this 
research. 


Methods 


A criterion measure of executive performance. 
A fairly clear criterion measure of executive 
efficiency may be based upon ratings or judg- 
ments of performance involving interplay be- 
tween different levels of supervision in a plant. 
These ratings of lower level supervision are 
usually made by higher level supervisors or 
managers. Typical supervisory ratings of 
general psychological capacities are inadequate 
for defining our concept of executive efficiency, 
but ratings based largely on potential capacity 
in supervisory work do come close to this defi- 
nition. The cooperation of an industry in 
which the latter type of rating had been re- 
fined and employed for some time was secured 
in order to conduct this study. 

In this industry! a supervisory rating system 
was employed in which individual supervisors 
were rated on a twenty-point scale. This 
rating scale is entitled “general potential ca- 
pacity for supervisory responsibility.” Five 
defined levels appear in this scale ranging from 
foreman level to company officer level. Rat- 
ings are made by indicating the maximum po- 
tential level a given man may be expected to 
achieve after appropriate experience and train- 
ing. Five superiors rate each man doing super- 
visory work. In order to adapt these ratings 
to the present work, the ratings obtained from 
the five judges were averaged to secure an 
overall rating for each man in the subject 
group. Seventy-two supervisors employed in 
transit work constituted the subject group. 
On the basis of the overall ratings these men 
were later divided into three subject groups 
designated as “superior” (Group I), “average” 
(Group II) and “below average” (Group III). 
Group I contained 20 men, Group II 22 men, 
and Group III 30 men. 

To secure a criterion of leadership perform- 
ance, procedures similar to those just described 
were followed with respect to employee ratings 
of supervisors. 


1 Milwaukee Electric Company, Milwaukee, Wis. 
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Measurement of Supervisory Ability 


Design of a capacity-type inventory for super- 
visory ability. In order to measure some as- 
pects of supervisory ability, we have developed 
an inventory of multiple-choice items entitled 
the “Supervisory Inventory.” The initial 
steps in the design of this inventory were made 
by Osterberg (2), who made a preliminary 
validation of some 45 items covering situational 
problems in supervision in industry. Limita- 
tions found in these items suggested extension 
of the battery to include items covering not 
only additional problematic situations in super- 
vision but also items related to the personal 
orientation of the supervisor toward others and 
items concerning personal background of the 
subject. In this extension of the original 
battery of items, a total of 219 multiple-choice 
questions were arranged to be taken as a 
capacity-type examination utilizing separate 
answer sheets and electromatic scoring. 

Examples of the types of items used in the 
inventory are as follows: 


a) Personality item. 
Your social life in the past few years has 
become: 
1. more active and interesting to you; 
2. more active but less interesting to 
you; 
3. disappointingly less active than when 
you were younger; 
4. enjoyably less active than when you 
were younger. 
b) Personal history item. 
When in grade school I attended: 
1. only one school; 
2. two schools; 
3. three schools; 
4, four or more schools. 
c) Problematic item. 
If a few workers in your department were 
needed to work overtime, you would: 
1. select the men with the most senior- 
ity; 
2. select the most efficient workers; 
3. give the work to the men with the 
least overtime; 
4. let the workers decide who should 
stay. 


The three different types of items described 
were arranged at random in this form of the 
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inventory with appropriate directions for ad- 
ministration as a group test. 

In the main study under discussion, this 
inventory was administered to the 72 super- 
visors by the training director in charge of 
supervision in the organization. The ratings 
were secured at approximately the same time, 
and recorded. The test results? and ratings, 
thus secured, formed the basis of the statistical 
analysis described below, which had the ob- 
jectives of establishing: (a) the correct re- 
sponse on each significant item; (b) the 
statistical significance of each item; and (c) 
selection of a battery of significant items for a 
revised form of the inventory. 

Responses of the test group were analyzed 
by means of a modified chi-square formula (1) 
in order to determine which response for each 
item distinguished between Group I and 
Group III. The modification of the formula 
employed is made to adjust for inequality in 
numbers in the group and reduces the apparent 
significance of the response statements. Ac- 
cordingly, all response statements which 
differentiated between the two groups at the 
20' per cent level of confidence were considered 
useful. It was also decided to include ques- 
tions in which response statements differenti- 
ated at the 30 per cent level of confidence pro- 
vided 25 per cent of one group chose that 
particular response. 

A preliminary investigation of the external 
consistency of the total battery of significant 
items was conducted in terms of point biserial 
coefficients of correlation (4) between scores of 
Groups I and II and triserial coefficients of 
correlation (3) between scores of Groups I, II 
and III and the original ratings. Separate 
biserial and triserial coefficients were also 
determined for specific categories of problem- 
atic items, personality items, and personal 
history items. 


Results 


Item content of the revised scale: Seventy 
questions were found to be usefully significant 
in relation to the criterion of executive per- 
formance, Examination of the 70 significant 
items revealed that 33 problematic items, 27 


? Compilation of the results was performed with the 
aid of the Computing Service of the University of 
Wisconsin, 
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personality, and 10 personal history items were 
included in the group meeting the confidence 
criteria adopted. The distribution of scores 
on all of these items by the subject groups show 
very little overlap between scores of Group I 
and Group III. Scores of Group II, however, 
overlap considerably with those of Groups I 
and III. 

Reliability and validity of the revised scale: 
In order to determine the external consistency 
of the revised scale with respect to the criterion 
of executive performance, scoring keys were 
constructed for the total inventory (i.e., the 70 
significant items) and for each type of question, 
as described above. Point biserial coefficients 
of correlation and triserial coefficients of cor- 
relation, obtained in determining the validity 
of each group of questions as well as the validity 
of the scale as a whole, are summarized in 


Table 1 


The External Consistency of the Revised Inventory 
with the Criterion Ratings 


Omer R. Jones and Karl U. Smith 


of the scores for odd and even items. This 
measure, when adjusted in terms of the 
Spearman-Brown correction, has a magnitude 
of +.86. 

There has been no attempt in this study to 
go into the question of the reliability and 
temporal consistency of our criterion meas- 
ures, inasmuch as practical limitations on re- 
search in the industries involved in this work 
make it necessary to allocate this problem to 
future study when the scale is used on a practi- 
cal basis. It may be indicated again that the 
criterion ratings were secured as a part of 
regular management operations, the imple- 
mentation of which did not permit evaluation 
of consistency between raters for the whole 
group of supervisors. 

A revised inventory of supervisory capacily 
based on leadership performance: The present 


Table 2 


The External Consistency of Different Types of Items 
with the Employee Leadership Criterion 





r r 
(Point biserial) (Triserial) 


Type of Question 


r 
Type of Question (Point biserial) 





Problematic questions 
Personality questions 
Personal history questions 
Total scale 


+.43 
+41 
+.16 
+.46 


+.43 
+.40 
+.16 i 
+.45 


Problematic 
Personality 
Personal history 
Total scale 


+.69 
+.73 
+.63 
+.64 





Table 1. These coefficients are based on 
segregation of groups of superior and below 
average supervisors in terms of the criterion 
ratings. This segregation for the triserial co- 
efficients is made on the basis of groups of 
superior, average and below average super- 
visors. The external consistency of the sepa- 
rate groups of questions with the criterion 
_ ratings is low in the case of the personal history 
' questions selected for the revised scale, and of 
_ a fair order of magnitude in the case of the 
_ problematic and personality items. The over- 
all consistency of the revised scale with the cri- 
terion is +.46 in terms of the biserial coefficient 
and +.45 when expressed in terms of the tri- 
serial coefficient. 

A measure of internal consistency of the 
revised inventory was obtained for the total 
scale on the basis of segregation and correlation 


study has been extended along the same lines 
to secure capacity measures of supervision 
derived from a criterion of leadership perform- 
ance. Such performance has been measured 
in terms of attitudinal estimates of employees 
concerning supervisors in a large retail estab- 
lishment.’ Results of these studies have proven 
to be consistent with data found with the ex- 
ecutive-type scale. When items were selected 
on the basis of the employee leadership cri- 
terion, 27 problematic items, 29 personality 
items, and 14 personal history items fell within 
the criterion of significance used. Few if any 
of these items were the same as those selected 
in terms of the executive performance criterion. 

Table 2 summarizes the relative levels of 
correlation, based on point biserial coefficients, 


* Boston Store, Milwaukee, Wis. 





Measurement of Supervisory Ability 


between different types of items and the 
leadership criterion. In this case, each of the 
three types of items displays a correlation with 
the criterion exceeding the value of +.60. 
The personality items give a correlation of 
+.73 with criterion. 

The corrected odd-even reliability of the 
scale of items based on the leadership criterion 
is +.90 as compared to the value of +.86 for 
the executive scale of items. 

The interrelation between the two types of 
scales developed here is significant with re- 
spect to our definitions and theory about 
supervisory ability. The intercorrelation be- 
tween the two scales, as determined on an 
additional population of subjects, is +.15. 

Intercorrelations between types of questions 
in both the leadership and executive scale of 
items have been computed. These data are 
given in Table 3. Examination of Table 3 


Table 3 


Intercorrelation Values between Different Types of 
Items in the Revised Scale 





r r 
(Executive (Leadership 
Types of Items Scale) Scale) 
Problem vs. personality +0.65** +0.47** 
Problem vs. personal history +0.22 +0.35** 
Personal history vs. 
personality 





+0.25* +0.47** 





* Significantly different from 0 at the .05 per cent 
level. 

** Significantly different from 0 at the .01 per cent 
level. 


will show that the interrelation between ques- 
tions in the executive scale is barely significant 
except for the correlation between problematic 


and personality items. Intercorrelation values 
for types of questions in the leadership scale 
are generally higher than those found for the 
executive scale. For both scales the problem- 
atic and personality history items give the 
lowest intercorrelations. 


Summary 


A general inventory of multiple-choice items, 
constituting a capacity scale for the measure- 
ment and forecasting of potential performance 
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in industrial supervisory work, has been de- 
veloped and administered in two large in- 
dustries. 

This general inventory has been checked 
against a criterion measure of supervisory 
ratings, as made by superior officers in a com- 
pany, in order to determine significant items 
in the inventory, the definition of the correct 
response for each significant item, and the 
level of external consistency between the 
refined inventory and the criterion. 

Results indicate that the multiple-choice 
questions of the sort used here can be observed 
to give significant correlations with the cri- 
terion of “executive performance” employed. 
Among the types of questions used, problematic 
items and personality questions showed the 
highest relation (triserial ry = +.43 for prob- 
lematic items and +.40 for personality items), 
while the personal history questions showed the 
lowest relation (triserial r = +.16) with the 
criterion ratings. The external consistency 
of the total battery of questions in the refined 
inventory gives a triserial correlation of +.45 
with the rating criterion. 

Results are indicated in regard to develop- 
ment of a scale of questions based on a criterion 
of “leadership performance” in supervision, 
as measured in terms of employee attitudes to- 
watd supervision. The scale thus derived 
shows significant differences from the scale 
based on ratings of supervisors by superior 
officers. The correlation of the “leadership” 
scale with its specific criterion is considerably 
higher than that obtained with scale of execu- 
tive ability. These results suggest that such 
an approach to measurement and prediction 
of supervisory performance may validly be 
broadened to include measures other than 
those based on opinions of management officers. 

The measures of external consistency of the 
executive and leadership scales indicated in 
this report offer evidence as to the relative 
significance of different types of inventory 
items for prediction of supervisory ability. 
These measures do not define a property of 
these scales suggestive of their valid use in in- 
dustries other than those in which the work has 
been done. 

The study here reported offers evidence of 
some possible future results, both of practical 
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Measured Changes in Acceptance of an Employee Publication* 


E. B. Knauft 
E. I. duPont de Nemours and Co. Inc., Wilmington, Delaware 


Communication between management and 
employees is particularly difficult in chain 
organizations which are spread over a large 
geographical area. Although the company in 
question is fortunate to have only 10 to 50 em- 
ployees in each unit,.it is faced with the prob- 
lem of effectively communicating with 1400 
employees in 91 units spread through 24 states. 
Since each unit is a retail manufacturing bake 
shop, the operations and problems of any one 
unit are very similar to those of all the other 
units. The company management has long 
recognized the value of a house organ and a 
weekly employee publication has been used 
continuously since 1925. 

The present investigation evolved from a 
desire by management to improve the effective- 
ness and acceptability of the publication. A 
content and readability analysis of the publica- 
tion, together with a readership survey, formed 
the basis for a complete revision of the publica- 
tion. After a period of six months the same 
type of analysis and survey were repeated to 
determine the extent of the revision and its 
effect on the employee audience. 


The First Analysis 


An effort was made to analyze certain as- 
pects of the publication which might influence 
its value as a communications medium between 
home office management and all levels of field 


and store employees. The following aspects 
of the house organ were analyzed: (1) the 
proportion of space in each issue devoted to 
various topics; (2) the percentage of space de- 
voted to photographs; (3) the reading ease; 
and (4) the human interest. Twenty-one 
weekly issues from January 29 through June 
18, 1949 were used in measuring all four of the 
above variables. 

The content analysis followed approximately 
the procedure outlined by Raney (3) based ona 
count of the number of column inches in each 

* This research was conducted while the author was 


associated with Federal Bake Shops, Inc., Davenport, 
Towa. 


issue devoted to various major topics. Head- 
lines and photographs were included in this 
content analysis. It was not practical to fol- 
low Raney’s categories because the present 
material did not divide itself into a very large 
variety of topics. A general division was 
therefore made into the four categories sug- 
gested by Paterson and Walker (2): personals, 
informative, descriptive and historical. ‘“Per- 
sonals” consist of gossip about particular em- 
ployees; “informative” refers to factual ma- 
terial such as merchandising plans for sales 
personnel and practical hints for bakers; “de- 
scriptive” material refers to straight news re- 
porting; “historical” refers to items of com- 
pany history. On this basis, it was found that 
space in the publication was distributed as 
follows: personals, 66.6%; informative, 28.2%; 
descriptive, 5.2%; and historical, 0%. This 
distribution appeared to indicate too much 
space was being devoted to personals and too 
little space to descriptive and historical ma- 
terial. The personals usually were of interest 
only to employees in one particular unit. 

The second factor, reading ease, was meas- 
ured by Flesch’s revised formula (1) based on 
the number of syllables per 100 words and the 
average sentence length. The formula com- 
bines these two measures and yields a reading 
ease score which may fall between 0 (very hard) 
and 100 (very easy). A total of 50 samples of 
100 words each were used with the personal, 
informative and descriptive material each 
represented roughly in proportion to the space 
percentages given above. The personals were 
found to have a reading ease score of 73.7, the 
informative material a score of 65.3 and de- 
scriptive material a score of 62.3. The mean 
score for all 50 samples was 69.0. This score 
is at the top of the 60 to 70 range described by 
Flesch as “Standard” (represented by the 
“digest” type of magazine) and readable for 
persons with 7th or 8th grade educational 
ability. 

The human interest factor was measured by 
Flesch’s formula which is based on the percent- 











E. B. Knauft 


Table 1 
Results of Part 1 of Readership Survey 





Question 


Per Cent of Respondents Checking Each Alternative 
Less Than 
2 Yrs. Service 


2 Yrs. Service 


All 
and Over Respondents 





1. Do you read the Weekly? 
Regularly 
Occasionally 
Seldom 
Never 


2. Do you take the Weekly home for your 
family to read? 
Regularly 
Occasionally 
Seldom 
Never 


3. Do you regard the present Weekly as— 
Excellent 
Good 
Fair 
Poor 


N=209 


N=152 N=361 


78% 63% 
22 31 

0 5 

0 i 


52% 
38 





age of “personal words” and the percentage of 
“personal sentences.” The data for this 
analysis were the same 50 samples of 100 words 
each that were used for the reading ease analy- 
sis. The human interest score can fall between 
0 (dull) and 100 (dramatic). The personal 
material had a score of 38.2, the informative 
' material a score of 29.2, and the descriptive a 
' score of 27.4. The mean human interest score 
_ of all 50 samples was 33.2, Flesch states that 
_ scores between 20 and 40 indicate “interesting” 
_ material of the “‘cigest” type. 

__ The reading ease and human interest scores 
_ of the “personals” are of special interest be- 
_ cause this material was written by employee 
_ salesgirl correspondents in the units. Thus 
_ the “‘personals” may be regarded as reflecting 
_ the actual readability level of the salesgirls 
who write them. However, it is probable that 
the writings of the salesgirl correspondents 
_would achieve lower scores if their subject 
matter was more weighty than “straight 
gossip.” 


The First Readership Survey 


The final phase of the first evaluation of the 
publication was a survey of employee readers 


by means of a questionnaire included in each 
copy of one issue of the publication. Usable 
questionnaires were returned by 361 employees 
from 75 units. This sample represents 26 per 
cent of all employees and 82.5 per cent of the 
units. No claim is made that this is a random 
sample of all company employees, but this was 
the most practical method to use under the 
circumstances. Because of the manner in 
which the sample was obtained, it is possible 
that members of this sample may have a more 
favorable attitude toward the publication than 
the total population of company employees. 

The answers of the respondents to three 
general questions are presented in Table 1. 
A breakdown was made of the responses of 
employees with less than two years service 
compared with those with two years and over. 
The right hand column gives the percentages 
for all respondents. Two years was taken asa 
convenient dividing point because employees 
become eligible for the company profit sharing 
plan only after they complete two years service. 
Fifty-eight per cent of the respondents had 
less than two years service.' 

‘Company personnel records show that 62 2 ond cent 
of the entire employee population have less two 


years service, so in respect the sample and the 
population show close agreement. 





Measured Changes in Acceptance of an Employee Publication 


It is apparent from Table 1 that the em- 
ployees with two years service and over read 
the publication more frequently, take it home 
more regularly and have a better opinion of it 
than employees with less than two years serv- 
ice. Sixty-three per cent of all respondents 
state that they read the publication regularly 
and 94 per cent of the respondents read it 
either regularly or occasionally. Although 
the present writer knows of no similar figures 
for the employee publications of other com- 
panies, it would appear that the frequency of 
readership is reasonably high for this sample. 
One important finding is that only 14 per cent 
of the respondents regard the publication as 
excellent while 26 per cent regard it as fair or 
poor, indicating considerable room for im- 
provement. 

The second part of the readership question- 
naire consisted of a list of 12 types of articles 
or features which either had appeared or might 
appear in the publication. The respondent 
was asked to rank these 12 types of articles in 
order of preference from one to twelve. The 
resulting ranking is presented in Table 2 with 
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the items arranged in order of preference from 
the best to the worst. Several items in this 
list had never appeared in the publication at 
the time of the first survey and in these in- 
stances the respondent had to guess the exact 
content implied by the name of that type of 
article or feature. It should be noted that 
“personals,” which occupied two-thirds of the 
space in recent issues, received fourth place. 
This ranking indicates that the respondents 
were definitely seeking more than store gossip 
in their publication. 

A comparison was made between the rankings 
of the 12 items made by the respondents with 
less than 2 years service and the other respond- 
ents. The two sets of mean rankings showed 
considerable over-all agreement (rho = .73). 
The principal disagreement centered around 
the ranking given the profit sharing item. 
The employees with less than two years 
service ranked profit sharing in ninth place 
while the other employees ranked it in third 
place. This discrepancy is to be expected 
because only the latter group of employees is 
eligible for profit sharing benefits. 


Table 2 


Rank Order of Preference for Twelve Types of Articles 











Mean Ranks 





Respondents with 





Less Than 
2 Yrs. Service 


Types of Articles 


2 Yrs. Service All 
and Over Respondents 





Picture-freature Stories 
(About outstanding employees) 
Questions and Answers 
(Covering production, merchandis- 
ing, profit sharing, etc.) 
Salesgirls’ Column 
Personals and Contributions 
(From field correspondents) 
Articles About Profit Sharing Plan 
Stories About Origin and Manufacture 
of Raw Materials We Use 
News About New Products 
Stories About Successful Competitors’ 
Operations 
Home Office News 
Reminiscences of Old Timers 
Stories About Company Officials 
Excerpts from Weeklies of 10 and 
20 years ago 


3.2 4.1 3.6 


3.9 4.5 
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Applying the Findings 


The content analysis, readability analysis 
and readership survey were taken into account 
by the editors in planning the following changes 
in the publication: 

1. The variety of topics in each issue was to 
be greatly increased. There was to be a 
marked decrease in “personals” accompanied 
by an increase in straight reporting of news 
stories. In addition, several new features 
were to appear in every issue or in alternate 
issues. These features included humorous 
anecdotes from old timers, a salesgirls’ column 
and photo-stories about outstanding employees. 

2. An effort was made to keep the reading 
ease and human interest of the material at an 
appropriate level. With a decrease in the 
proportion of personals and gossip, it was 
especially important to pay more attention to 
the readability of new material which may be 
more difficult to present. 

3. Numerous changes were made in fortnat 
whenever possible. It was planned to have a 
large photograph on the front page as well as 
to increase the quality and quantity of all 
photographs. All articles were to have short 
headlines, several styles of type were to be 
used and the regular feature columns were to 

have distinctive headings. 


The Second Analysis 


_ The various changes in the publication were 
instituted with the June 25, 1949 issue and 


year. At the end of 1949 a second analysis of 
the publication was made and a second survey 
of readers was conducted to measure the extent 
of the changes in the publication and the effect 
of these changes on the employee readers. 
Since the second analysis paralleled the first 
one, it is possible to compare the two sets of 
data by means of Table 3. 

The second content analysis was based on 
all the materials in 25 issues commencing with 
the June 25th issue. Reading ease and 
human interest scores, determined by Flesch’s 
method, were based on 60 samples of 100 words 
each. The means and standard deviations 
given in the last two rows of Table 3 are based 
on all samples and are not simple means of the 
figures in the table. - 

It is apparent from Table 3 that the alloca- 
tion of space to the four major types of ma- 
terial was markedly altered at the expense of 
the “personals.” The new distribution of ma- 
terial seems to represent a well-balanced read- 
ing fare compared to the previous distribution. 
The mean Flesch reading ease score for all 
samples in the second analysis was almost 
identical with that of the first analysis, but 
there was a slight improvement in the mean 
human interest score in the second analysis. 
Both reading ease and human interest fall in 
the “digest” category in both analyses. This 
level is probably appropriate for the majority 
of employees on the basis of Flesch’s (1) state- 
ment that persons with 7th or 8th grade educa- 
tional ability can read material with a reading 
ease score of 61 to 70. Since Flesch advises 


‘continued throughout the remainder of the 


Table 3 


Comparison of Certain Characteristics of the Publication 
Before and After Changes in Content Were Made 








First Analysis 


Flesch 
Reading 
Ease 


73.7 


Second Analysis 


Flesch 
Reading 
Ease 








Flesch Per Cent 
Human of 
Interest Space 
38.2 3.9 
28.2 65.3 29.2 18.4 
5.2 62.3 27.4 71.3 
0.0 -- —- 6.4 


Flesch 
Human 
Interest 


Per Cent 
o 


Type of 


Material Space 


66.6 





, Personals 
Informative 
Descriptive 
Historica] 
Mean of 


All Samples 
S.D.’s 


69.0 
9.1 
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writing one level below that of the intended 
audience, there is still room for improvement 
in the present instance. 

Table 3 indicates that both the reading ease 
and human interest of “informative” and 
“descriptive” material were markedly im- 
proved in the second analysis. The means for 
all samples do not reflect this improvement be- 
cause the second content analysis reflects a 
great decrease in “personals.” It seems likely 
that “personals,” even if they were not written 
by salesgirls, might by their very nature tend 
to have higher reading ease and human interest 
scores than the other types of material. It was 
also found that 27 per cent of the space was de- 
voted to photographs of all kinds as compared 
with 16 per cent in the first analysis. 


The Second Readership Survey 


The second readership survey was conducted 
approximately six months after the first changes 
were made in the publication. This survey was 
again in the form of a questionnaire included 
as a part of one issue of the publication. A 
total of 162 usable questionnaires were re- 
turned from 22 units. This represents 12 per 
cent of all employees and 24 per cent of the 
units. Although the second survey is based 
on a smaller sample than the first survey, 
there is evidence that the two samples are 
similar in two respects. Fifty-eight per cent 
of the respondents in the first survey had less 
than two years company service compared 
with 54 per cent in the second survey. Simil- 
arly, the sample in the first survey was com- 
posed of 37 per cent bakers and 43 per cent 
salesgirls while the second sample contained 
33 per cent bakers and 45 per cent salesgirls. 

Table 4 affords a direct comparison of the 
responses to the first three questions on the 
first and second surveys. This table indicates 
that there has been practically no change in 
the regularity with which the publication is 
read. However, there is a statistically sig- 
nificant increase in the percentage of respond- 
ents taking the publication home “regularly” 
for the family to read. There is also a statisti- 
cally significant increase in the percentage of 
respondents regarding the publication as 
“excellent” and a corresponding drop in the 
percentage regarding it as “fair.” These find- 
ings indicate that, for the samples measured, 
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Table 4 
Comparison of First and Second Readership Surveys 





Per Cent of Re- 
spondents Check- 
ing Each 
Alternative: 
First Second 


Question Survey Survey 





N=361 N=162 
1. Do you read the Weekly? 
Regularly 63% 64% 
Occasionally 31 32 
Seldom 5 4 
Never 1 0 


2. Do you take the Weekly 
home for your family 
to read? 

Regularly 
Occasionally 
Seldom 
Never 


3. Do you regard the pres- 
ent Weekly as— 
Excellent 27%* 
Good 60 60 
Fair 25 12* 
Poor 1 1 





* Difference in percentages significant at 1% level of 
confidence. 


there was a marked improvement in general 
attitude toward the publication but no increase 
in frequency of readership. It might be hy- 
pothesized that those who read the publication 
fairly regularly were more prone to respond to 
both the first and second questionnaires and 
that within the group there was an improve- 
ment in general opinion toward the publication. 
This improvement is reflected by the responses 
to both questions 2 and 3. 


Comparison with Other Studies 


Data from two other studies of employee 
publications may be compared with the pres- 
ent findings. Paterson and Walker (2) meas- 
ured the readability of 34 different house or- 
gans in Minnesota. Using the same Flesch 
technique as that applied in the present study, 
they found these house organs had a mean 
reading ease score of 47.6 and a mean human 
interest score of 26.2. These figures compare 
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with a reading ease of 68.6 and a human inter- 
est of 35.9 in the present study. Paterson 
and Walker reported a higher human inter- 
est score for “personals” but lower scores for 
informative, descriptive and historical material 
than those of the present second analysis. 

Raney (3) measured the readability of issues 
of 27 different employee newspapers of a single 
large company. He does not give mean scores 
for these publications but presents his data as 
percentages of samples earning various reading 
ease scores. He found that 66 per cent of his 
samples had a reading ease score below 60, 
whereas the present second analysis indicates 
only 12 per cent of the samples studied had 
reading ease scores below 60. Material with 
scores below 60 may be classified as very hard 
(0-29) or hard (30-59) to read. 


Summary 


The employee publication of a bakery chain 
was evaluated by means of: (1) a content 
analysis of a large number of issues; (2) Flesch 
counts for reading ease and human interest; 
and (3) a readership survey of a sample of em- 
ployees. The results of this evaluation formed 
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the basis for a number of changes in the publi- 
cation. The principal changes were: (a) a 
more balanced content of various types of sub- 
ject matter; and (b) an improvement in the 
reading ease and human interest of certain 
types of subject matter. After a six months 
period a second evaluation was made using the 
same techniques as in the first evaluation. A 
comparison of the “before” and “after” sur- 
veys of employees indicates a significant in- 
crease in favorable attitude toward the pub- 
lication. The reading ease and human interest 
of the revised publication were also found to 
be at a more appropriate level than similar 
Flesch indices reported for house organs by 
two other investigators. 
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A Table for Use with Flesch’s Level of Abstraction 
Readability Formula 


Dik Warren Twedt 
Northwestern University 


Flesch (2) has recently presented a method 
for measuring level of abstraction. As a 
rough measure of the level of abstraction of 
prose, he suggests using the percentage of 
“definite words’”—a measure which corre- 
lated .55 with comprehension by grade school 
children. (Examples of “definite words” are 
names of people, numeral adjectives, finite 
verbs, pronouns, etc.) 

When the percentage of definite words is 
combined with word length (syllables per 100 
words) in a multiple regression formula, the 
resulting single index of readability correlates 
.72 with comprehension. Scores computed by 
the new readability formula have a range from 
0 (very difficuit) to 100 (very easy). 

As a simple device for counting the number 
of definite words, the space bar on an ordinary 
typewriter is helpful. Set the margin stop at 
0, and hit the space bar each time a definite 


word is encountered. When the sample pass- 
age is completed, read the total from the cyl- 


inder scales. To get the syllable count, go 
through the passage again, this time counting 
all syllables (except the first) in all words of 


more than one syllable, and add the total to 
the number of words tested. 

Then enter the accompanying table to ob- 
tain corresponding readability scores. For 
example, if the sample passage has 28 definite 
words per 100 words of copy, and a syllable 
index of 141, the readability score is 69. This 
table is similar to that prepared by Farr and 
Jenkins (1) for Flesch’s earlier formulas. In 
analyzing 137 magazine advertisements, all 
but 2 were found to be within the limits of the 
table. 


Received March 13, 1951. 
Early publication. 
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Vocational Interests and Managerial Success* 


E. B. Knauft 
Personnel Research Section, E. I. duPont de Nemours & Company, Wilmington, Delaware 


The present research evolved from an at- 
tempt to increase the efficiency of a selection 
battery for bake shop managers (2). Experi- 
ence with this battery indicated that the Classi- 
fication Inventory, an inventory of personal 
preferences and attitudes, was a moderately 
successful predictor of managerial success. It 
therefore seemed logical to investigate other 
measures of interests and attitudes which 
might be related to job success. The Strong 
Vocational Interest Blank was chosen because 
it could be scored with a large number of 
existing keys which would afford a comparison 
between the vocational interests of bake shop 
managers and those of other occupational 
groups. In addition, the responses of man- 
agers known to be successful or unsuccessful 
on the basis of an independent criterion could 
be used to construct a bake shop manager 
scoring key. 


Measured Interests of Managers 


The S.V.I.B. was first administered to 38 
men who had been managing retail manu- 
facturing bake shops for three months or more. 
These managers had a mean age of 42.6, a mean 
education of 9.7 grades and a mean of 8.8 years 
managerial experience. In this company each 
manager has primary responsibility for the 
profitable operation of his shop, including pur- 
chasing, production and merchandising. The 
38 men represented the top 27 per cent (N = 
19) and bottom 27 per cent of all managers on 
the basis of an independent criterion of job 
success to be discussed in detail later in the 
paper. The responses of the 38 managers 
were scored on 39 of Strong’s occupational 
keys and interest maturity, occupational level 
and masculinity-feminity. The mean stand- 
ard scores of these 38 managers on each key 
were computed to determine the interest 
pattern of the “average”’ manager. 

*This study was completed while the author was 


associated with Federal Bake Shops, Inc., Davenport, 
Iowa. 


Following Darley’s (1) classification of pri- 
mary, secondary and tertiary interests, the 
following patterns emerged. Group VIII 
scores formed a primary pattern with a letter 
grade of A for office man, B+ for accountant, 
purchasing agent and mortician, and B for 
banker. The mean score for production man- 
ager, the only occupation forming Group III, 
was B+. The score of B was earned for presi- 
dent of a manufacturing concern, the only 
occupation in Group XI. A tertiary pattern 
was found in Group IX with a B for sales 
manager, B for real estate salesman and B— 
for life insurance salesman. 

The high score for production manager 
might be anticipated because this occupation 
is probably more similar to bake shop manager 
than any of the 39 groups. Strong (3) does 
not give specific information about the job 
duties of the production managers used in his 
norm group. Since some of that group were 
listed in Thomas’ Register of American Manu- 
facturers it may be assumed that Strong’s man- 
agers held positions of greater responsibility 
than the bake shop managers. On the other 
hand, there is probably some overlap of the 
job requirements in the two instances. 

The primary pattern in Group VIII might 
be explained on the basis of two duties of the 
bake shop manager: (1) simple bookkeeping, 
report work, payrolls and balancing cash daily, 
and (2) the purchase of raw materials and 
supplies. Bookkeeping and report work, how- 
ever, are decidedly of secondary importance 
for the successful bake shop manager. The 
purchasing of materials is generally a routine 
task of “average” importance, but occasionally 
bad judgments in purchases can have a sig- 
nificant effect on profits. 

The B score for president of a manufacturing 
concern appears meaningful in terms of analy- 
sis of the job duties of this group compared 
with the present managers. Probably the 
principal difference between the duties of the 
presidents and the bake shop managers is one 
of degree. Strong’s presidents head organiza- 
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tions which buy raw materials, manufacture 
products from these materials and then sell 
the products. The bake shop manager heads 
a very small organization which performs the 
same three functions and his success, like that 
of the president, is largely measured by the 
resulting profits. 

The tertiary pattern in Group IX, the sales 
group, reflects somewhat the sales responsibili- 
ties of the manager. The average manager 
has practically no direct contact with custom- 
ers compared with the Group LX salesmen who 
all engage in direct selling. However the 
manager engages in merchandising activities 
such as the planning of variety of products, 
selection and pricing of specials and the over- 
all supervision of the sales force. If Strong 
had a key for merchandise manager, we would 
expect the bake shop manager to score fairly 
high on it. 

The bake shop managers had a mean occupa- 
tional level standard score of 50.8. When this 
score is compared with the mean O. L. score 
of 63.4 which Strong (3) reports for presidents 
and the score of 60.2 for production manager, 
it is apparent that the bake shop manager is 


far below the other two supervisory jobs in the 


occupational heirarchy. In fact, carpenters 
and policemen are the only occupations in 
Strong’s 39 scoring groups having O. L. mean 
scores below the bake shop managers. Strong 
(3, p. 190) reports a mean O. L. score of 49.0 
for a group of 15 foremen and a score of 48.3 for 
122 skilled workers. No O. L. score is avail- 
able for bakers, but since the great majority of 
the bake shop managers were originally bakers, 
and still spend part of their time baking, it 
seems probable that in terms of occupational 
level the bake shop manager is more closely 
identified with the skilled trades and foremen 
groups than with the typical managerial 
groups. 

The O. L. scores of the 38 managers were 
found to correlate —.02 with age, .29 with 
educational level and .21 with size of store 
managed. None of these coefficients are sig- 
nificant; a correlation of .32 is required for 
significance at the 5 per cent confidence level 
for a sample of 38 cases. The correlation be- 
tween O. L. and store size was computed to test 
the hypothesis that men with higher O. L. scores 
eventually win promotions to the larger stores. 
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The correlation of .21 suggests that this hy- 
pothesis may be tenable but should be tested 
on a larger number of cases. 


Differentiating Good and Poor Managers 
with Strong’s Keys 


It was previously stated that the group of 
38 managers was actually selected from a 
larger population of managers on the basis of 
their job success. The criterion of job success 
was a ratio of controllable costs to the sales of 
the unit under the manager’s direction.!_ The 
entire population of 70 managers was ranked in 
order of their controllable cost ratios and the 
highest and lowest 27 per cent (N = 19 in each 
sub-group) formed the sample who took the 
S.V.I.B. For convenience these sub-groups 
may be designated as “good” and “poor” 
managers. 

In order to determine if Strong’s occupa- 
tional keys are useful in differentiating between 
the good and poor managers, the mean scores 
of the 19 “good” managers and the 19 “poor” 
managers on the 39 occupational keys and 
interest maturity, occupational level and 
masculinity-feminity were computed. An 
analysis of these data indicated that for no 
key was the difference between mean scores of 
good and poor managers significant at the 5 
per cent level of confidence. A detailed analy- 
sis of individual scores as well as a study of the 
patterns of scores yielded no additional in- 
formation, and so it was concluded that no 
existing S.V.I.B. keys or combinations of keys 
would be useful in predicting managerial suc- 
cess in the present context. This finding 
might be anticipated because: (1) no key ex- 
ists for any job which is very similar to the 
bake shop manager; and (2) Strong’s keys are 
based on a comparison between “successful” 
men in the given occupation and “men-in-gen- 
eral” rather than between groups who are suc- 
cessful and unsuccessful on the given job. 

A bake shop manager key was therefore con- 
structed on the basis of the responses of the 
highest 27 per cent and lowest 27 per cent of 


1 A complete discussion of various measures of the job 
performance of bake shop managers as well as reliabil- 
- data for these measures may be found in reference 
(2). Experience has indicated that the controllable 
cost ratio, corrected for district effect, is the most reli- 
able and relevant single measure of managerial success 
in the company under consideration. 
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the managers in terms of the job success cri- 
terion. The scoring keys were based on items 
which differentiated between the high and low 
criterion groups at the 10 per cent confidence 
level or better. On the basis of the extent and 
direction of item responses determined by the 
phi coefficient, four scoring keys were con- 
structed with item weights of +1, +2, —1 and 
—2 respectively. These keys were used to 
score §.V.I.B.’s subsequently administered to 
67 applicants for the position of bake shop 
manager. These same applicants also took 
other tests in the manager selection battery 
(2). It was found that the S.V.I.B., scored 
with the bake shop manager key, correlated .27 
with Jurgensen’s Classification Inventory (also 
scored with a special bake shop manager key) 
and .34 with the Wonderlic Personnel Test. 
These intercorrelations indicate that the 
S.V.LB., scored with the manager key, is 
measuring responses which are somewhat in- 
dependent of the other two tests in the battery. 

A cross-validation study of the manager key 
for the S.V.I.B. was conducted on a population 
of 32 manager applicants who were selected for 
manager training. Since some of this group 
were dismissed or left voluntarily before their 
six months’ training was completed, it was im- 
possible to use the controllable cost ratio as a 
measure of job success for all of these men. 
However, it was felt that the men who did not 
complete their training should be considered as 
part of the criterion population because com- 
pletion of the training course is a prerequisite 
to a managerial position. An operational cri- 
terion measure was available for those men who 
completed their training and became managers. 
One meaningful validity measure of the 
S.V.I.B. manager score is a biserial correlation 
between that score and a “success-failure” 
criterion of job performance. The “success” 
group (N = 16) was comprised of managers 
whose performance, based on mean controllable 
cost ratios for a six months’ period, placed them 
in the top 50 per cent of all 91 managers in the 
company. The “failure” group (N = 21) 
was comprised of men who failed to complete 
the training course, who voluntarily left the 
company during their first six months as 
managers or who fell in the lower 50 per cent 
of all managers in terms of the controllable 
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cost ratio of the unit they operated. The bi- 
serial correlation for the population of 32 men 
was .53. This coefficient is significant at the 
1 per cent level of confidence. 

The biserial of .53 is particularly significant 
because the 32 men had previously been 
screened on the basis of the manager selection 
battery consisting of the Personnel Test, the 
Classification Inventory and a test of baking 
knowledge. The small number of cases makes 
the multiple correlation technique an inap- 
propriate measure of the contribution of the 
S.V.LB. to the battery, but the biserial cor- 
relation and the low intercorrelations between 
the S.V.I.B. and the other tests suggest that 
the S.V.I.B., scored with the manager key, is a 
useful and independent addition to the battery. 
A detailed examination of the S.V.I.B. scores 
of this population shows that 10 out of 13 or 
77 per cent of the “success” group received 
scores above the 59th percentile on the man- 
ager key while only 6 out of 19 or 32 per cent 
of the “failure” group received scores above the 
59th percentile. The distribution of scores in- 


dicated that the 59th percentile would be the 
most efficient and practical tentative cutting 
score to use for the S.V.I.B. manager key. 


Summary 


The Strong Vocational Interest Blank was 
administered to 38 managers of shops in a re- 
tail bakery chain. An analysis of the mean 
scores of the group, based on Strong’s 39 voca- 
tional keys, revealed the following patterns of 
measured interests: Mean score of B+ for 
production manager; primary pattern of 
Group VIII occupations with a score of A for 
office man, B+ for accountant, purchasing 
agent and mortician and B for banker; tertiary 
pattern in Group IX, the sales occupations; 
mean score of B for president of manufacturing 
concern. An attempt was made to explain 
these scores and patterns in terms of similari- 
ties and differences between the above occupa- 
tions and bake shop manager. 

The relationship between scores earned on 
Strong’s keys and managerial success, as 
determined by an independent criterion, was 
found to be insignificant for all 39 occupational 
keys and interest maturity, occupational level 
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and masculinity-feminity. A bake shop man- 
ager scoring key was constructed on the basis 
of responses differentiating the successful from 
the unsuccessful managers. This key was then 
cross-validated on a second group of 32 man- 
ager trainees and new managers. A biserial 
correlation of .53 was found between job suc- 
cess and scores on the bake shop manager key. 
Since this key showed relatively low intercor- 
relations with other tests in an existing man- 
ager selection battery, it was concluded that 
the S.V.I.B., scored -with the special key, 
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would increase the efficiency of the selection 
battery. 
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Vocational Interests and Q—L Scores on the A. C. E. 


John W. Gustad 
Vanderbilt University 


While Adkins and Kuder (1), Carter (4), and 
Strong (11) have all reported low correlations 
between measures of intelligence and voca- 
tional interests, Carter (3) and Darley (6) have 
proposed hypotheses concerning the origins of 
interests in intelligence factors which are in es- 
sential agreement. Carter says that, “These 
patterns consist, basically, of a series of ap- 
proximations in the attempt of the organism 
to fit itself with its biologicai qualities into 
somewhat rigid social structures.” 

Dealing with intelligence factors rather than 
single, gross estimates, Adkins and Kuder (1) 
investigated the relationship between the 
Kuder Preference Record and the Thurstone 
Primary Mental Abilities Test. Taking stu- 
dents scoring in the upper and lower 27 per 
cent of the distribution on each Kuder scale, 
they plotted mean standard scores of these on 
each of the P. M. A. keys. There were sug- 
gestive profile differences, but the actual differ- 
ences were, for the most part, not statistically 
significant. They also presented correlations 
between the Kuder scales and the sub-test 
scores of the P. M. A.; all of them were low, the 
highest for men being .27 between literary 
interests and verbal ability. 

Darley (5), in a study primarily concerned 
with re-assessing the predictive efficiency of 
certain tests for college success, obtained cor- 
relations between the Primary Mental Abilities 
Test and certain Strong keys (selected because 
they were representative of factors isolated 
earlier). These keys were: chemist, carpenter, 
Y. M. C. A. secretary, purchasing agent, life 
insurance salesman, and lawyer. Using cor- 
relation ratios as the measure of association, he 
found that only eleven of the forty-two co- 
efficients were significant at or beyond the five 
per cent level, the largest being —.31 between 
purchasing agent and verbal ability scores. 
Four others were between .20 and .30. Ignor- 
ing directional signs, the median coefficient was 
found to be .09. In general, these results indi- 
cated substantial independence of measured 
interests and abilities. 


Long (7) made a similar study in which he 
used the Strong, the A. C. E., and the Zyve 
Scientific Aptitude Test. The groups he used 
for comparisons were composed of students 
who either got B plus and above or B and below 
on certain interest area measures. He first 
compared Zyve scores for these two groups and 
found that those with stronger interests in the 
technical-mathematical areas got higher scores 
than those with less intense interests, while in 
the welfare, business contact, and business de- 
tail areas, the opposite was found. Repeating 
this comparison with the A. C. E., Long found 
no differences except that students with verbal 
interests tended to get higher scholastic apti- 
tude scores. 

Another approach to the same problem was 
taken by Munroe (9) who selected the upper 
and lower quarters of a distribution made up 
of difference scores between the Quantitative 
and Linguistic sections of the A. C. E. and then 
compared Rorschach response between these 
groups. Her subjects were women college 
students; those with the higher difference 
scores, she called quantitative thinkers, those 
with lower scores, linguistic thinkers. She 
found no difference in over-all adjustment, 
numbers of responses, or numbers of words 
used. ‘The quantitative group she described 
as formal, intellectual, objective, while she 
characterized the linguistic group as subjec- 
tive, introversive, with rich imaginations. 
These results are in line with those obtained 
by Roe (10) who gave Rorschachs to a group 
of paleontologists and technicians and found 
them to be like Munroe’s quantitative think- 
ers: abstract, formal, objective. 


Purpose 


The purpose of the present study was to 
compare the vocational interest patterns and 
clinical key scores made on the Strong Voca- 
tional Interest Blank by a group of students 
with varying differences between the Quantita- 
tive and Linguistic parts of the A. C. E. in 
order to obtain evidence bearing on the hy- 
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potheses concerned with the origins of such 
interests. 


Method 


During the winter quarter, 1950, the men in 
the junior class of the Arts and Engineering 
colleges were asked to complete the Strong 
Vocational Interest Blank. Men were chosen 
for greater homogeneity; juniors were selected 
because it was assumed that, by the third year, 
vocational interest patterns would have had a 
better chance to mature than among freshmen. 
Scores made at entrance on the A. C. E. (1947 
edition) were obtained from the files of the 
University Counseling Service. There were 
217 cases with both sets of data. 

For each student, the difference between the 
local percentiles made on the Quantitative and 
Linguistic sub-sections of the A. C. E. was com- 
puted and a distribution of these differences, 
with algebraic signs maintained, was made. 
The mean was found to be —2.1, the standard 
deviation, 31.9. This distribution was then 
divided into three parts at the plus one-half 
and minus one-half sigma points. The three 
groups so selected were as follows: one with 
high positive difference scores, indicating rela- 
tively higher Quantitative scores (the Q 
group); one with high negative difference 
scores, indicating relatively higher Linguistic 
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scores (the L group); a final one composed of 
those whose scores were nearly equal (the 0 
group). 

Using the system outlined by Darley (5), 
primary interest patterns were identified. 
Eighty-three per cent of the Q, 73 per cent of 
the 0, and 85 per cent of the L group had such 
primaries. The differences between these per- 
centages were not statistically significant. 
There were no primary patterns in groups VI 
(Musician), VII (C. P. A.), and XI (President, 
Manufacturing Concern). Because there were 
only four in group III (Production Manager), 
these were combined with group IV (Sub- 
technical); the mean correlation between the 
Production Manager key and the keys in 
group IV was found to be .47, based on data of 
Strong (11). For purposes of analysis, there 
were then 60 students in the Q, 54 in the O, and 
60 in the L group. 


Results 


A profile comparison was first made. Pro- 
portions of students in each group who had pri- 
mary patterns in each of the interest areas were 
computed and curves were drawn. These are 
presented in Figure 1. 

The general impression is one of over-all 
similarity both as to shape and elevation. 
Working from the hypothesis that different 
kinds of intelligence are associated with differ- 
ent interest patterns, it was expected that the 
Q group would have a majority of its primary 
patterns in groups I through IV, that the L 
group would have its primaries in areas V 
through X, and that the O group would fall 
between. There appears to be some indication 
that this is so, but the L peak in group IV and 
the Q peak in group IX do not fit with these 
expectations. 

Tests of significance of the differences be- 
tween the proportions were made and are in- 
cluded in Table 1.1. Only the one between Q 
and L in group X was significant, falling at the 
.01 level. A broader test was then made with 

1 To reduce printing costs, Table 1 containing these 
tests as well as the actual proportions, shown graphi- 
cally in Figure 1, has been deposited with the American 
Documentation Institute. Order Document 2943 
from American Documentation Institute, 1719 N Street, 
N.W., Washington 6, D. C., remitting $0.50 for micro- 
film (images 1 inch high on standard 35 mm. motion 


picture film) or $0.50 for photocopies (6 X 8 inches) 
readable without optical aid, 
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the interest areas divided into scientific-tech- 
nical and welfare-business groups. Again, the 
differences between proportions of the three 
aptitude groups having primary scores in these 
areas were not statistically significant. There 
seemed, then, on the basis of the foregoing 
tests, to be no reason to reject the null hy- 
pothesis. 

Because the Q — L difference, as taken, al- 
lowed for uncontrolled operation of the general 
intelligence factor, an additional step was 
undertaken to assess the effect of this. The Q 
and L groups were subdivided, in terms of 
total A. C. E. score percentiles, into two groups 
at the fiftieth percentile. Then, following the 
technique used above, profiles were drawn for 
the Q and L sub-samples scoring above the 
median; similar profiles were drawn for the 
below average samples. Profiles for both the 
high Q and L and the low Q and L groups were 
strikingly similar, both in shape and altitude. 
From this, it would seem that the general intel- 
ligence factor was not obscuring real differ- 
ences between the groups. 

The next step involved a study of the non- 
occupational, or clinical, keys of the Strong: 
Occupational Level, Interest Maturity, and 
Masculinity-Femininity. Using all members, 
both with and without primary patterns, of 
the three aptitude groups as a base, analyses 
of variance of clinical key scores were com- 
puted, a separate analysis being made for each 
key, using the technique outlined by McNemar 
(8, pp. 247-249). The purpose was to deter- 
mine whether the three aptitude groups ob- 
tained significantly different mean scores on 
the clinical keys. The results of these analy- 
ses are presented in Table 2. Using these data, 
F tests were computed for the ratio of Between 
and Within variance estimates. 

For OL, this was 2.775; for IM, it was 2.673; 
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Table 3 


Correlations between Q — L Scores and 
Scores on the Clinical Keys 





OL IM MF 





Q-L —.167* 032 070 





* Denotes statistically significant at less than the .05 
level. 


for MF, it was 1.232. None of these ratios 
was significant at the .05 level of confidence. 
From these results, it may be concluded that, 
as for the vocational interest patterns, there 
were no differences between the aptitude 
groups in terms of clinical key scores. 

Finally, correlations between Q — L scores 
and scores on the clinical keys were computed. 
These are presented in Table 3. Only one of 
these was significantly different from zero: the 
one between OL and Q —L. Its sign indi- 
cates that there is a slight tendency for domi- 
nantly quantitative individuals to have some- 
what lower OL scores, but the magnitude of the 
relationship is such that this relationship 
should probably not be given much weight. 
These coefficients are of the magnitude of 
those reported by Strong (11) between intel- 
ligence test scores and the clinical keys. 
These were .05 for OL, .13 for IM, and —.16 
for MF. 


Discussion 


The failure to obtain statistically significant 
differences hetween interest patterns of stu- 
dents with differing aptitude scores leads to the 
necessity, at least within the framework of the 
present study, for accepting the null hypothe- 
sis that there are no interest differences associ- 
ated with ability differences. That, in turn, 
leaves in doubt the hypotheses concerning the 


Table 2 
Analyses of Variance of Q — L Scores and the Clinical Keys 





OL 


IM 





Sums of 
Squares 


Sums of 
Squares 





d. f. 





157.891 
6088.589 
6246.480 


178.569 
7148.358 
7326.927 


2 








Vocational Interests and Q—L Scores on the A. C. E. 


origins of such interests in basic aptitudes. It 
also raises the question as to why there should 
have been personality differences, as measured 
by the Rorschach, and not vocational interest 
differences. One answer to this last question 
may lie in the method of analysis. It is pos- 
sible that a more detailed analysis, such as 
would be provided by an item analysis, would 
show differences. On the other hand, the 
clinical use of the Strong is in terms of the 
patterns which were dealt with in the present 
study. Whether an item analysis would pro- 
vide data useful to counselors is doubtful. 

Another consideration is the stability of the 
Q — L differences themselves. There is the 
possibility that a portion of these represent 
chance differences, a situation which would 
tend to obscure real differences between groups. 
It was, unfortunately, not possible to obtain a 
retest reliability of these differences, which 
would be necessary to estimate the stability of 
the scores. By cutting the difference distribu- 
tion so that a full standard deviation separated 
the Q and L groups, it was hoped that this 
situation would be minimized in its effects on 
the results. Lacking this information, the 
interpretation of the results must leave room 
for the possibility that some real difference in 
interests might have been obscured by the 
presence of people in the Q and L groups whose 
difference scores were products of chance and 
that, conversely, some whose aptitudes did in 
fact differ did not appear to. 

A further explanation for the results lies in 
the fact that the kinds, and to some extent the 
amounts, of intelligence required for various 
occupations are not generally recognized. 
There is the additional difficulty that most oc- 
cupations are so heterogeneous in their de- 
mands that not one but several patterns of 
aptitudes would be acceptable. The possi- 
bility of bringing data to bear on the basic 
hypothesis concerning the origins of interests 
is weakened by the lack of knowledge of occu- 
pational requirements, both in fact and in 
stereotype. For example, it is probably true 
that engineers must have considerable facility 
in symbolic-mathematical areas. Yet so, also, 
must certain welfare type workers, such as 
personnel psychologists, as well as some busi- 
ness men and most accountants. Further, 
engineers deal, in their jobs, with people and 
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verbal material. Engineering school faculties 
are aware, as are many students, that a con- 
siderable number of the graduates will end up 
working in sales and administrative jobs. How 
much selection, especially self-selection, is done 
in terms of judged abilities in these other areas 
is not known. 

What is probably needed is more information 
concerning the occupational stereotypes. Bor- 
din (2) has suggested that vocational interests 
represent a kind of role playing in terms of 
occupational stereotypes. It may be that 
more detailed information concerning both the 
objective job requirements as well as the oc- 
cupational stereotypes would lead to a more 
fruitful analysis which might, then, shed light 
on the problem of the origins of interests. 


Summary 


Two sets of scores were obtained for a group 
of men college juniors: one consisting of the 
difference between the Quantitative and Lin- 
guistic sub-score percentiles of the A. C. E.; 
the other, Strong Vocational Interest Test 
scores. The Q—L distribution was cut to 
isolate a dominantly quantitative, a dominantly 
linguistic, and a nearly equivalent group. 

Primary patterns on the Strong for these 
groups were compared to see whether aptitude 
type was associated with differences in inter- 
ests. No consistent pattern differences were 
found, and only one difference between pro- 
portions was significant. 

The clinical keys of the Strong were also 
studied, using the analysis of variance tech- 
nique, but no tendency to have differing clinical 
key scores among the three groups was found. 
Correlations were also obtained between the 
Q—L difference and scores on the clinical 
keys; only the one between Q — L and OL was 
significant, and it was quite low. 

On the basis of these results, it was con- 
cluded that the hypothesis that vocational 
interests are conditioned by differential apti- 
tudes was unsubstantiated by the present 
study. 


Received July 21, 1950. 
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Modified Directions for Strong Vocational Interest Blank 
When Used with the Hankes Answer Sheet * 


C. Harold Stone 
It. Cdr., USNR, Great Lakes, Iilinois 


and 
Philip H. Kriedt 


Prudential Insurance Company of America, Newark, New Jersey 


In connection with a study of the measured 
interests of industrial relations personnel in 
management and union positions (conducted 
by mail), two methodological questions were 
raised. These were: (1) Will individuals fill 
out the Hankes Answer Sheet! as readily as 
they will place their answers directly on the 
Strong Blank, and (2) will individuals com- 
plete the Hankes Answer Sheet as accurately 
as they will the Strong Blank itself? 

Results of an initial pilot study based on re- 
turns from a sample of 100 top industrial rela- 
tions executives were considered adequate to 
answer the first question. Fifty executives 
were asked to record their answers on the 
The second 


separate Hankes Answer Sheet. 
group of 50 was sent only the Strong Blank 
and asked to record answers on the booklet 


itself. Forty-two returns were received from 
the first group and forty-three from the second. 
Apparently, the fact that it is somewhat more 
laborious to use the Hankes Answeg Sheet was 
not sufficient to prevent individuals from 
responding. 

In mailing the Strong Blank and Hankes 
Answer Sheet without providing special sup- 
plementary instructions, i.e., allowing respond- 
ents to complete the inventory on the basis of 
instructions printed on the back of the Answer 
Sheet and included in the Strong Booklet, it 
was found that approximately 20 per cent of 
the returns contained obvious errors in record- 
ing replies on the Answer Sheet.? As a result, 


* Data for this study were collected while Dr. Kriedt 
was Research Fellow and Dr. Stone was Research As- 
sociate on the staff of the Industrial Relations Center, 
University of Minnesota. 

! Reference is made to the Hankes Answer Sheet and 
Scoring Service in: Strong, E. K., Jr., and Hankes, E. J. 
A note on the Hankes Test Scoring Machine. J. appl. 
Psychol., 1947, 31, 212-214. 

2The majority of errors occurred in Part VI of the 
Strong Blank on questions 281 through 320. “Forced 


revised directions were prepared, mimeo- 
graphed, and pasted on the Interest Blank 
booklets. To determine the effectiveness of 
the new directions, one-half of a sample of 112 
union staff members and one-half of 300 in- 
dustrial relations personnel in management 
positions were sent booklets with revised di- 
rections and Hankes Answer Sheets. The 
other half of each of the two samples received 
unrevised booklets and Hankes Answer Sheets. 
Results based on returns from the “revised 
directions” groups and the “standard direc- 
tions” groups for both union and management 
damples are shown in Tables 1 and 2. Per- 
centage of returns slightly favored the “‘stand- 
ard directions” group in the union sample 
whereas in the larger management sampie the 
revised directions group substantially exceeded 
the “standard directions” group in per cent of 
returns. However, as shown in Table 2, both 
union and management samples reflect the 
same trend of more errors with the “standard 
directions.”” It is interesting to note that in 
both “revised directions” groups the percent- 
ages of error approximate 5 per cent. On the 
other hand, the “standard directions” union 
group shows slightly more than twice as many 
errors as the comparable management group 
(32 per cent as compared to 15 per cent). 
Data are not available to account for this 
difference. Perhaps the management group 
is somewhat more sophisticated in the taking 
of tests and filling out of questionnaires. The 
important fact, however, is that use of the 
revised directions on the Strong Blank, when 
responses are recorded on the Hankes Answer 


choice”’ instructions call for checking three out of ten 
choices as most liked, three least liked, and the remain- 
ing four as intermediate. Most common errors in- 
volved checking more than three choices as ‘most 
liked” or “‘least liked.” 
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PartI. Occupations. Indicate after each occupation listed below whether you would like that kind of 
work or not. Disregard considerations of salary, social standing, future advancement, etc. Consider 
only whether or not you would like to do what is involved in the occupation. You are not asked if you 
would take up the occupation permanently, but merely whether or not you would enjoy that kind of work, 
regardless of any necessary skills, abilities, or training which you may or may not possess. On the separate 
answer sheet: (Under the number of the item you are answering) mark an X in the top space (the row 
marked L) if you like that kind of work. Mark an X in the middle space (the row marked I) if you are 
indifferent to that kind of work. Mark an X in the bottom space (the row marked D) if you dislike that 
kind of work. Work rapidly. Your first impressions are desired here. Answer all the items. Many of 
the seemingly trivial and irrelevant items are very useful in diagnosing your real attitude. 


1 Actor (not movie) I 
2 Advertiser I D 


Be sure to read the instructions to each new part before proceeding, since directions vary for different sec- 
tions of this questionnaire. 


PartIV. Activities. Indicate your interests as Part V. Peculiarities of People. Record your 
in Part I. first impression. Do not think of various possi- 

“a bilities or of exceptional cases. ‘Let yourself go” 
186 Repairing a clock.......... D and record the feeling that comes to mind as you 
187 Adjusting a carburetor D read the item. 


188 Repairing electrical wiring. . . 

Part VI. Order of Preference of Activities. On the separate answer sheet indicate which three of the 
following ten activities you would enjoy most by placing three X’s in the top row (marked L/1); also indi- 
cate which three you would enjoy least by placing three X’s in the bottom row (marked D/3). Check the 
remaining four activities in the middle row (marked I/2). 


1 2 3 


) ( ) Develop the theory of operation of a new machine, e.g., auto 
) ( ) Operate (manipulate) the new machine 
) 9 


Discover an improvement in the design of the machine 


Part VII. Comparison of Interest between TwolItems. On the separate answer sheet indicate your 
choice of the following pairs by making an X in the top row (1) if you prefer the item to the left, in the 
middle row (2) if you like both equally well, and in the bottom row (3) if you prefer the item to the right. 
Assume other things are equal except the two items to be compared. 


Work rapidly. . 
321 Street-car motorman i.) Ca ( ) Street-car conductor 


Part VIII. Rating of Present Abilities and Characteristics. Indicate on the separate answer sheet 
what kind of a person you are right now and what you have done. Place an X in the top row (Y) if the 
item really describes you, in the bottom row (N) if the item does not describe you, and in the middle row 
(>) if you are not sure. (Be frank in pointing out your weak points, for selection of a vocation must be 
made in terms of them as well as your strong points.) 
? NO 

361 Usually start activities of my group. ‘ jKiiee wee ow tJ ( 
362 Usually drive myself steadily (do not wit be fits oni eee. a eae ree oe ( 
363 Win friends easily t=) ( 
Mark an X in the top, middle, or bottom row to indicate whether the first, second, or third statement in 
each item applies to you. 

(Ist) (2nd) (3rd) 
389 (1) Feelings easily hurt (2) Feelingshurtsometimes (3) Feelingsrarelyhurt (_ ) € =) («) 


390 (1) Usually ignore the (2) Consider them some- (3) Carefully consider ( ) ( ) ( ) 
feelings of others times them 


Fic. 1. Revised directions for the Strong Vocational Interest Blank for Men for use with Hankes Answer Sheet 





Strong Vocational Interest Blank and Hankes Answer Sheet 


Table 1 


Completed Answer Sheets Returned by Two Groups Receiving Strong Vocational Interest Blanks 
for Men with “Standard Directions” and Two Groups with “Revised Directions” 





Number of 
Completed 
Answer Sheets 
Returned 


Number of 
Blanks 


Sample Mailed 


Per Cent Returns 





Standard 
Directions 


Revised 
Directions 





Management Personnel 
Union Personnel 
Total 


300 224 
112 46 
412 270 


65.3 
44.8 
59.7 


84.0 
37.5 
711 





Table 2 


Number and Per Cent of Errors Observed in Recording Responses to Strong Blank on Hankes Answer 
Sheet Using Strong’s Standard Directions and I.R.C. Revised Directions 








Number of 
Answer Sheets 
Returned 


Revised 
Directions 


Standard 


Sample Directions 


Number of 
Answer Sheets 
with Errors 


Per Cent of 
Answer Sheets 

with Errors 

Revised 
Directions 


Standard Revised 
Directions Directions 


Standard 
Directions 





Management Personnel 98 126 
Union Personnel 25 21 
Total 123 147 


15 7 
8 1 
23 8 


15.3 5.6 
32.0 48 
18.7 5.4 





Sheet, results in a substantial reduction of 
errors for both groups. 

The experience of the writers, as well as that 
of Dr. E. K. Strong, Jr. and others with whom 
the problem has been discussed, indicates that 
use of the Strong Blank without an answer 
sheet results in errors on approximately 5 per 
cent of the blanks returned in mail studies. 
It is concluded, therefore, that the results ob- 
tained with revised directions in the booklet 
when used with the Hankes Answer Sheet are 
as satisfactory as when responses are recorded 
directly on the Strong Blank. Where savings 
in time and money, which result from use of 
the Hankes Answer Sheet, are factors, either in 
research or in service operations, and where 
either a minimum or no supervision in admini- 
stration of the Strong Blank is required, use of 
revised directions on the Strong Blank is 
strongly indicated.’ 


3 Mr. E. J. Hankes of Engineers Northwest, Minne- 
apolis, was shown a manuscript copy of the present 


The revised directions found to be effective 
in reducing errors in recording responses on 
the answer sheet are shown in Figure 1. 
When the mimeographed revised directions 
were pasted over the printed directions in 
tle booklet, it was not necessary to oblit- 
erate completely the printed directions in all 
cases. In several instances, it appeared de- 
sirable to change only a portion of the original 
directions. 


Received June 21, 1950. 


article and responded by re-designing the Hankes 
Answer Sheet, ey Parts VI, VII, and VIII. 
This should provide a partial solution. In a letter to 
the Editor, Mr. Hankes wrote: “Engineers Northwest 
is grateful to the authors for their efforts and has added 
instructions on the answer sheet to clarify the situation. 
These new answer sheets have been pos bom for the 
old. We also wish to express thanks to Dr. Wilbur 
Layton (University of Minnesota) for his suggestions 
for changes.” If the Strong Blank itself were re- 
designed as indicated in Figure 1, then the problem 
discussed in this paper would be solved so far as it can 
be solved when the test is administered by mail. 
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Fakability of the Classification Inventory Scored for Self Confidence 
Robert D. Mais 


Neenah, Wisconsin 


Falsifying on personality inventories is an 
important problem in the field of personality 
measurement. The problem arises when the 
subject chooses test responses he feels are 
socially acceptable rather than responses which 
actually describe him. A review of past 
studies (1, 2, 3, 6, 7, 8, 9, 10) indicates that 
most personality tests of the questionnaire or 
self-inventory type are susceptible to falsifica- 
tion if the subject is so motivated. The many 
suggestions for counteracting this weakness 
range from disguised names and questions to 
scoring stencils which measure the amount of 
faking on the Inventory. 

Jurgensen constructed his Classification In- 
ventory in a way which he believed would 
minimize the effect of faking. He paired 
socially acceptable and desirable responses 
with equally acceptable and desirable re- 
sponses, while undesirable responses were 
paired together. Jurgensen developed his 


Inventory for use in industry and validated it 


against job success rather than various per- 
sonality traits, but he did say that in certain 
situations it might be advisable to have keys 
which measure personality traits rather than 
specific occupations (4). 

Because a person who lacks self confidence 
is not sure of himself and wants to make a good 
impression on others, the effect of faking is 
especially important in measuring self con- 
fidence. Because it was difficult to get raters 
who knew the subjects well enough to rate 
them accurately yet who would not tend to 
rate them favorably, it was decided to use a 
self rating blank. The blank was designed to 
aid the subject to evaluate himself, and to 
avoid the subject’s rating himself favorably to 
gain social approval, he was requested not to 
sign his name. The situation was such that 
all rating blanks and Inventories could be 
paired. 

From a survey of the literature on the sub- 
ject it was concluded that self confidence is not 
a single trait but a combination of several 
traits which form the characteristic behavior 


of a person who has or who lacks self confidence. 
The rating blank was, therefore, made up of 
six traits: 1. Motives of others (People who 
lack self confidence distrust the ethics of 
others); 2. Social (People who lack self con- 
fidence are so self conscious they often feel ill 
at ease in a group and have difficulty meeting 
people); 3. Conformity (Those who lack self 
confidence may be over concerned with the 
impression they make on others and may be 
greatly motivated toward gaining social ap- 
proval); 4. Decisions (People lacking self 
confidence feel they are inadequate and fear 
failure and, therefore, may have difficulty 
making decisions); 5. Criticism (Criticism em- _ 
phasizes the subject’s feeling of inadequacy, so 
he dislikes being criticized and is hyper- 
critical of others as a defense mechanism); 6. 
Foresight (People who lack self confidence often 
become chronic worriers who imagine all sorts 
of possible future dangers rather than dealing 
with actual facts). Each of the traits had a 
line after it on which the subject was to check 
the point which most nearly described him. 
In order to obtain a uniform standard of evalu- 
ation, where all raters would tend to agree as to 
a median and to the variations above and be- 
low this median, each trait had five degrees 
described in detail below the line. Also, in an 
attempt to assure attention to descriptive 
phrases, the qualities of greatest value were 
sometimes placed at the right end of the scale 
and sometimes at the left. 

The Jurgensen Classification Inventory and 
the rating blank were administered to a group 
of 100 University of Colorado students (50 
males and 50 females) in an education class. 
The class was told that it was not known how 
the average college student would respond to 
the Classification Inventory and that a good 
sample was desired. They were accordingly 
instructed to, “Answer the questions as frankly 
and as carefully as possible.” ‘They were not 
told what the Inventory was intended to 
measure. After the Inventory was completed 
the subjects were instructed to answer the 
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rating blank. The rating blank was given 
after the Classification Inventory was com- 
pleted so as not to affect the results on the 
Inventory in any way. It was again stressed 
that a good sampling was desired and that they 
should answer, “As frankly and as carefully as 
possible.” They were also told not to sign 
their names to either the Classification In- 
ventory or the rating blank. They were al- 
lowed to put down the date of their birth on 
the Inventory and rating blank if they so de- 
sired and were promised that they would be 
informed of the results according to these 
dates. In this manner the motivation to try 
to get a good score was removed by not re- 
quiring signatures, but an interest in the out- 
come of the test was instilled by promising to 
report the results by birth dates. An indica- 
tion of this interest was shown by the fact that 
only twelve out of the group of 100 failed to 
note their birth dates. 

To score the rating blanks, the mean and the 
standard deviation of the distribution for each 
trait was obtained. Then, the standard score 
or z-score for each trait on each blank was com- 
puted and the total score on the rating blank 
was obtained by finding the algebraic sum of 
the z-scores. The highest and lowest 25% of 
this test group were selected according to the 
score made on the rating blank. The upper 
25% was made up of ten males and fifteen 
females, while the lower 25% was made up of 
twelve males and thirteen females. There was 
no significant difference in the scores obtained 
by males and females on the rating blank. 

Each item on the Classification Inventory for 
each of the fifty subjects was analyzed to 
determine the frequency of the responses. The 
frequencies were changed into proportions and 
item validity determined, by phi coefficients, 
from the proportion of subjects in each of the 
two groups who gave similar responses to each 
item. The writer followed Jurgensen’s sug- 
gestion: “. .. when working with upper- 
lower quarters or other such separated groups, 
a level of significance of .10 (nineteen items 
out of twenty indicating true differences) 
seems adequate” (5). 

According to Jurgensen, “Tests having 
weighted scores require more time for scoring 
but may show increased validity or reliability” 
(5). Because of this, both weighted and un- 


weighted scores were used in this study. Items 
were weighted plus or minus one for signifi- 
cance between .10 and .05, plus or minus two 
between .05 and .02, and plus or minus three 
for significance of .02 and over. In this man- 
ner sixty-two items were selected to measure 
self confidence as defined above and as meas- 
ured on the rating blank. 

The Classification Inventory and the rating 
blank were administered to a second group 
(thirty education students) with the same in- 
structions as were given to the first group. 
The scores on the rating blank were correlated 
with the weighted scores on the Classification 
Inventory using the key which was developed 
on the first group. The coefficient of correla- 
tion was .75 and is significant at the .01 level. 
The Classification Inventory, then, is a valid 
measure of self confidence as defined above and 
as measured with the rating blank. 

While the correlation between unweighted 
scores and scores weighted as above was .93, 
the validity coefficient for the unweighted 
scores was only .39 and is significant at the 
.05 level. Thus it seems, in this case at least, 
tobe well worth the effort to weight the scores. 

One week later the Classification Inventory 
and the rating blank were again administered 
to the second group. The rating blank was 
administered, first this time, with the same in- 
structions as before The Inventory was ad- 
ministered this time under different condi- 
tions. The subjects were informed that it was 
a test of self confidence and were instructed to 
respond to the questions in a manner in which 
they felt would give them a high score in self 
confidence whether they actually felt that way 
or not. 

The correlation between the self rating on 
confidence and the Classification Inventory 
with instructions to falsify was .07. The cor- 
relation between the Jurgensen Classification 
Inventory administered first under standard 
conditions, then with instructions to falisfy was 
only .17. The mean increase for the group on 
the Classification Inventory was 12.8 points 
and is significant at the .01 level. The fact 
that the subjects could significantly increase 
their scores on self confidence when they de- 
sired to do so gives some evidence of criterion 
validity. The retest reliability correlation of 
the rating blank was .91. 
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If the subjects had not known what trait was 
being tested they might or might not have been 
able to influence their scores. Then too, this 
was a group of university students with above 
average education and probably above average 
intelligence which also may affect the ease with 
which they could falsify the Classification In- 
ventory. The fact remains that this group 
could greatly affect their scores on the Classi- 
fication Inventory when motivated to do so. 


Summary 


A scoring key for self confidence was de- 
veloped on the Jurgensen Classification In- 
ventory using a self rating blank of self con- 
fidence as a criterion. The Inventory was 
given to a new group twice, first with standard 
instructions, then with instructions to try to 
get a good score on self confidence. It was 
found that: 

1. A valid key for the measurement of self 
confidence as described in the study and as 
measured by the rating blank was developed 
on the Classification Inventory. 

2. While there was a high correlation be- 
tween weighted ,and unweighted scores, 
weighted scores gave a much higher validity 
coefficient. 

3. When measuring self confidence, as de- 
fined above and as measured by the rating 
blank, and when this group of education stu- 
dents were told what the test was measuring, 


Robert D. Mais 


the Classification Inventory was found to be 
susceptible to falsification. 


Received June 30, 1950. 
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Job Satisfaction of Liberal Arts Graduates 
Gail M. Inlow 


Northwestern University 


Anyone who attends a college or who is in 
any way responsible for a college program rea- 
sonably makes the assumption that a college 
education leads to certain outcomes. They 
may be personal, social, economic, or a com- 
bination of these. Regardless, one of the most 
important problems facing higher education 
today is to discover what the specific outcomes 
are. With a knowledge of outcomes, college 
administrators are better able to formulate a 
program on a functional basis. 


Problem 


On the premise that the vocational experi- 
ences of students after they have left college is 
one determinant of what a college program 
should be, the investigator embarked on a sur- 
vey of selected male graduates of the College 
of Liberal Arts of Northwestern University for 
the following purposes: (1) To determine how 
well satisfied graduates are with their occupa- 
tions; and (2) To discover what factors are 
related to job satisfaction. 


Method 


The data-gathering technique which was 
employed was a questionnaire. It wis trans- 
mitted on November 29, 1949, and the final 
usable return was received on January 14, 
1950. The only follow-up letter which was 
employed was mailed on December 8, 1949. 
The questionnaire covered five interest areas 
as follows: personal data, undergraduate ex- 
periences, attitudes toward undergraduate 
experiences, vocational experiences, and job 


attitudes. Inasmuch as the major emphasis 
of this study is on job satisfaction, data per- 
taining to the other interest areas will be re- 
ferred to only when they are related to the 
main topic. 

Sample 

Male liberal-arts graduates of Northwestern 
University made up the sample. Five gradu- 
ating classes were surveyed, but they are 
treated in this study as three groups. The 
classes of 1927 and 1928 are identified as Group 
I; of 1937 and 1938, as Group IT; and of 1948, 
as Group III. Data regarding the sample are 
contained in Table 1. 

The 229 full-time workers who responded 
had residences in 36 states. Approximately 
50 per cent were from Chicago and its suburbs, 
and an additional 20 per cent were from the 
rem:.inder of Illinois and the states which 
border on it. All but three members were of 
the White race. The occupations of the sample 
members will be described subsequently. 

The 46.2 per cent of usable returns received 
by the investigator may be compared with 48.3 
per cert reported by Greenleaf (3, p. 3), 49.5 
by Eurich and Pace (1, p. 8), 70 by Pace (6, 
p. 22), and 70 by Tunis (10, p. 13). The re- 
turns of the Greenleaf and Eurich and Pace 
studies are comparable to, but the returns of 
the Pace and Tunis studies are considerably in 
excess of, those of the Northwestern study. 
The Pace study was financed by a Carnegie 
Foundation grant and was participated in by a 
large number of the University of Minnesota 


Table 1 


The Questionnaire Returns 











Group I 
1927-1928 


Group II 
1937-1938 


Group ITI 
1948 





Number Surveyed 

Number Who Responded 

Per Cent Who Responded 

Number Full-Time Workers 
Who Responded 


127 
67 
53.7 


67 


231 
101 
43.7 


244 
110 
45.1 


98 64 





175 
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faculty. The questionnaire was graphically 
illustrated and five follow-up notices were 
transmitted. All of these factors had salutary 
effects on the size of the sample. In the Tunis 
study, much of the information reported by 
the author was obtained from the Twenty- 
Fifth Year Book of the Harvard class of 1911. 
In addition, the personal influence of Tunis on 
his classmates may have been a factor of 
importance. 

As a means of determining the representa- 
tiveness of the sample, the investigator com- 
pared: (1) The number of respondents and 
non-respondents in Groups I and II on the 
factor of senior grades; (2) The number of 
respondent and non-respondent lawyers, phy- 
sicians, and dentists of Groups I and II who 
had received their professional degrees from 
Northwestern University; and (3) The number 
of respondent and non-respondent students of 
Group III who were enrolled as of 1 January, 
1950, in the graduate school of Northwestern 
University. On the factor of senior grades, 
the averages of the respondents were signifi- 
cantly higher at the 5 per cent level of confi- 
dence than those of the non-respondents. A 
slightly greater number of lawyers, physicians 
and dentists responded than did not respond, 
but the difference failed the test of significance. 
Approximately the same number of graduate 
students of Group III responded as did not 
respond. 2 ee 

Despite the attempts made by the investiga- 
tor to determine the representativeness of the 
sample, conclusive proof was not uncovered 
that the sample was either typical or atypical. 
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It is entirely possible that the sample included 
less than its proportionate share of unemployed 
or low-salaried employees. It is equally 
possible, however, that a disproportionate 
number of high-salaried employees were re- 
luctant to divulge the salary information re- 
quested in the questionnaire. 


Criterion of Job Satisfaction 


The criterion of job satisfaction which was 
employed was a continuum consisting of the 
following seven categories: (1) Completely 
satisfied; (2) Well satisfied; (3) More satisfied 
than dissatisfied; (4) Equally satisfied and dis- 
satisfied; (5) More dissatisfied than satisfied; 
(6) Very dissatisfied; and (7) Completely dis- 
satisfied. 

The graduates were asked to subscribe to one 
item only on the list. Subsequently, arith- 
metical weights of 1 to 7 were assigned to the 
items, thereby making possible the quantifying 
and comparing of results. Hereafter in the 
study, the seven-point continuum will be re- 
ferred to as the Job-Satisfaction Scale, and the 
means of groups, as job-satisfaction scores. 
If the members of a selected group should be 
identified as having a job-satisfaction score of 
2.50, the assumption would be made that the 
extent of their satisfaction was equidistant 
from the intervals: “Well satisfied” and “More 
satisfied than dissatisfied.” 


Results 


In Table 2, the expressed attitudes of the 
sample members toward their jobs are pre- 
sented in the light of the job-satisfaction cate- 


Table 2 
Job Satisfaction, by Class Groups 





Group II 


J. S. Cateogries . No. 


Group III 
% 





Category 1 
Category 2 
Category 3 
Category 4 
Category 5 
Category 6 
Category 7 

Total 

Mean Score 


21.9 
35.9 
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Table 3 
Job Satisfaction, by Occupational Groups 





J. S. Categories 


_ 


Occupational Groups 3 


5 





. Minister-Soc. Serv. 
. Lawyers 

. Top Management 

. Educators 

. Physicians, Dentists 
. Scientific-Technical 
. Production and Opns. 
. Technical Staff 

. Government 

. Sales 

. Clerical 

. Total Professional 

. Total Bus. and Ind. 


CaoOnaurwWne 
a 
BSSranwawtrawe 
— 
BSBwwnwwwwh) 


1.82 
1.95 





gories to which they subscribed and the mean 
job-satisfaction scores per group. 

The most significant implication of the data 
contained in Table 2 is that the preponderant 
majority of the sample members were reason- 
ably well satisfied with their jobs. Of the 229 
full-time workers in the sample, 175 or 76.3 
per cent indicated that they were completely 
satisfied (category 1) or well satisfied (category 
2). Only 11 members or 4.9 per cent were more 
dissatisfied than satisfied (category 5), very 
dissatisfied (category 6), or completely dis- 
satisfied (category 7). 

Other investigators on job satisfaction report 
that workers are essentially satisfied with their 
jobs. Pace (6, p. 40), in the Minnesota survey, 
writes that the “typical man or woman in this 
study did not love his job but he liked it.” In 
Hoppock’s (4, p. 166) study of 500 teachers, 
477 subscribed to the 6 positive categories of a 
12 point continuum which ranged from the 
statement: “I like it (the job) fairly well” to 
“T love it.” Only 23 subscribed to the 6 
negative categories which ranged from the 
statement, “I like it a little” to “I hate it.” 
Roper (7, p. 43) reports that of the executive 
and professional representatives of the sample 
which he studied, 85 per cent liked their jobs 
all of the time; 14 per cent, some of the time; 
and only 1 per cent, none of the time. Kitson 
(5, p. 49), in a study of 247 teachers and 140 


nurses, identified only 9 per cent of the former 
and 8 per cent of the latter as dissatisfied with 
their jobs. 

The mean job satisfaction scores of the three 
groups are 1.9, 1.9, and 2.6 respectively. The 
difference between the mean score of Group III 
and that of Group I or II is significant well be- 
yond the 1 per cent level of confidence. By a 
comparison of the mean scores of Groups I and 
II, it may be observed that the degree of job 
satisfaction was very little different for groups 
of individuals who had been employed 10 to 11 
years and 20 to 21 years. Apparently the 
most significant change had taken place some- 
time during the first 10 or 11 years of employ- 
ment. In an attempt to identify the reasons 
for the increase in job satisfaction with time 
spent on the job, the investigator will sub- 
sequently relate job satisfaction to: (1) Factors 
of the job itself; (2) Personal data; and (3) 
Campus experiences. 


Factors of the Job Itself 


In this section, job satisfaction will be related 
to the factors of occupational categories, time 
on the job, and job status. 

Occupational categories. 
dicated the degree of job satisfaction of the 222 
full-time workers of the sample who responded 


In Table 3 is in- 


to the questionnaire items in point. The table 
lists the occupations of the sample members by 
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rank order on the basis of the job-satisfaction 
scores of the three class groups listed in the last 
column. 

The significant implications of the data con- 
tained in Table 3 are that the job satisfaction 
scores vary with the specific occupational cate- 
gories considered but that when the time-on- 
the-job factor is controlled by eliminating 
Group III from the analysis, the various scores 
tend to become more alike. A comparison of 
the last two columns illustrates the point. 
Little change in job satisfaction takes place in 
the professions because only 4 of the newer 
graduates (Group III) were in professional 
jobs—all of these in education (Line 4). The 
difference is more obvious in the case of sci- 
entific-technical jobs (Line 6); production and 
operations (Line 7); and sales (Line 10). The 
only category in which older employees were 
less satisfied than newer employees is in that 
captioned, technical staff (Line 8). 

When the graduates of Group III are in- 
cluded in the anlysis, a statistically significant 
difference, based on the Chi Square test, exists 
between professional employees (Line 12) and 
those of business and industry (Line 13). 
When Group III is excluded, the difference is 
not significant. 

Data reported by Hoppock (4, p. 36) and 
Uhrbrock (11, p. 370) seem to substantiate 
the findings of the Northwestern study to the 
effect that job satisfaction is a function of the 
type of occupation in which one is engaged. 


Table 4 


Job Satisfaction Related to Salary and 
Time on the Job 





| 





Mean 
7.2 


Salary Intervals Scores 





. Under $3,000 

. 3,000-3,999 

. 4,000-4,999 

. 5,000-7,499 

. 7,500-9,999 

. 10,000-19,999 
. 20,000 and over 
Time on the Job 
1. Less than 1 yr. 
2. 1-2 yrs. 

3. 3-5 yrs. 

4. 6 or more yrs. 


2.80 
2.45 
2.31 
2.03 
1.76 
1.71 
1.56 


2.46 
2.38 
1.92 
1.90 
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Salary and time on the job. Because the in- 
clusion of Group III, as indicated in Table 3, 
decreased the overall! job-satisfaction scores of 
the sample, the investigator sought to deter- 
mine the reasons why the newer workers in the 
job market were less satisfied. Two possible 
reasons were salary and time on the job. The 
relationship of each of these two factors with 
job satisfaction may be determined by an analy- 
sis of the data contained in Table 4. 

The factor of salary seems to be intimately 
related to job satisfaction. When the Chi 
Square test was applied, significant differences 
at the 1 per cent level of confidence resulted 
between groups whose salaries are under and 
over $3,000 per year and groups whose salaries 
are from $5,000 to $7,499 and over $7,500. 
By observation, the conclusion may be drawn 
that job satisfaction has a positive linear rela- 
tionship with salary. 

This finding is in accord with the study of 
1400 employees made by the Standard Register 
Company (8, p. 83), in which is reported a 
positive correlation between job satisfaction 
and salary. On the other hard, Fortune 
Magazine (12, p. 106), adopts a different view- 
point when it states that, “Money is no longer 
a prime incentive in getting a good day’s work 
out of the boss.” 

The relationship between time on the job and 
job satisfaction was likewise found to be sig- 
nificant. The Chi Square test revealed a dif- 
ference which was significant at the 1 per cent 
level of confidence between groups the members 
of which had worked less than and more than 
3 years. After the third year of employment, 
differences in job satisfaction were practically 
negligible. Compared with the 1.92 score of 
the 3-5 yr. group were scores of 1.89 for the 
6-10 yr. group, 1.90 for the 11-15 yr. group, 
and 1.91 for the 16 yrs.-or-over group. The 
conclusion may therefore be reached that the 
first two or three years of employment reveal 
the greatest depressing influence on job satis- 
faction. 

Status on the job. When the full-time 
worker sample was divided into groups: (1) 
owners, part-owners, or supervisors and (2) 
non-supervisory employees, the former had a 
job satisfaction score of 1.95; the latter, a score 
of 2.33. The difference was significant at the 
2 per cent level of confidence. When the 
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scores of supervisory were compared with 
those of the non-supervisory employee group, 
significant differences did not result. 

These results are similar to those reported 
by Hoppock (4, p. 130). 


Personal Factors 


The preceding sections have treated of 
factors of the job and their relationship with 
job satisfaction. Subsequently, the personal 
factors of marital status, age, and retirement 
plans will be related to job satisfaction. 

Marital status and age. In Table 5, the 
factors of marital status and age are related to 
job satisfaction. 

On the factor of marital status, statistically 
significant differences at the 1 per cent level of 
confidence resulted when the mean job satis- 
faction score of the group of married persons 
with or without children was compared with 
the score of the group of single persons. A 
comparison between the mean scores of the 
childless married-couple group and the married 
group with children just failed the test of 
significance. 

_ In contrast to the findings of this study on 

marital status, Hoppock (4, p. 35) reports that 
job satisfaction and marital status were not 
related in a significant way. Fryer (2, p. 29) 
states that the married members of the varied 
occupational sample with which he worked 
“showed a slightly higher tendency to be in- 
terested in their occupation.” 

The job satisfaction scores of the various age 
groups listed in Table 5 appear to be irregularly 
curvilinear. The only statistically significant 
difference in job satisfaction which was dis- 
covered was between groups of persons under 
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and over 27 years of age. Although this 
difference is very significant, it is possible 
that the depressing effects of low salaries and 
limited time spent on the job are more im- 
portant influences on job satisfaction than age 
itself. Certainly, age as a causal factor can- 
not be proved. 

Hoppock (4, p. 40), working with a sample 
of 500 teachers, and Thorndike (9, p. 706), with 
a sample of younger individuals in industry, 
found that job satisfaction increased somewhat 
with age. Fryer (2, p. 29), working with a 
heterogeneous occupational group, reported 
that, “Age appears to have practically no 
effect.” 

Retirement plans. The sample members 
were asked to indicate whether they were 
satisfied, dissatisfied, or undecided about their 
financial plans for retirement. A statistically 
significant difference of criterion scores resulted 
between groups the members of which were 
satisfied (1.82) and undecided (2.39). The 
difference between those who were satisfied 
(1.82) and those who were dissatisfied (2.19) 
was not significant. Apparently indecision 
had more effect than dissatisfaction. 

Campus Experiences 

Heretofore, selected job and personal factors 
have been analyzed in the light of job-satis- 
faction scores. In this last part of the study, 
attention will be focused on the selected 
campus factors of grades, campus belonging- 
ness, and the affinity between college major 
and job duties. 

Grades. Senior grades only were available 
to the investigator, and these just for Groups 
I and II. The relationship between senior 


Table 5 
Job Satisfaction Related to Marital Status and Age 











Mean 
JS. 


Marital Status Scores 


Age 





2.78 
2.26 
1.89 
1.95 
1.76 
1.99 


. Single 

. Married, no children 

. Married, 1 child 

. Married, 2 children 

. Married, 3 or more children 
. Total married 


. 22-27 yrs. 
. 28-33 yrs. 
. 34-39 yrs. 
- 40-45 yrs. 
. 46 yrs. and over 
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grades and job satisfaction was found to be 
negligible. When the grades of the two class 
groups were divided into quarters, the mean 
job satisfaction scores, by quarter, from the 
lowest to the highest, were found to be 1.91, 
1.93, 1.90, 1.90. The conclusion is obvious 
that the senior grades of graduates out of 
College from 10 to 20 years have no statistical 
relationship with job satisfaction. 

Campus belongingness. The sample mem- 
bers were divided into the following groups: 
(1) graduates who belonged to no campus or- 
ganizations’ of any type; (2) graduates who 
were fraternity members only; (3) graduates 
who were non-fraternity men but who belonged 
to one or more other campus organizations; 
and (4) graduates who belonged both to 
fraternities and to one or more other campus 
organizations. 

When the mean job-satisfaction scores of 
groups (1) and (4) supra were compared (2.32 
and 1.81 respectively) a statistically significant 
difference at the 1 per cent level of confidence 
resulted. Other inter-group comparisons 
failed the test of significance. 

Affinity between college major and job duties. 
The graduates were asked to state whether 
their job duties were: (1) The same as, (2) 
Closely related to, (3) Somewhat related to, 
(4) Distantly related to, or (5) Entirely unre- 
lated to their college subject-matter majors. 
The mean scores of these five groups were: 
1.89, 2.00, 2.03, 2.28, and 2.57 respectively. 
When the’ mean score of groups (1), (2) and 
(3) was compared with that of groups (4) and 
(S), the difference was found to be significant 
well beyond the 1 per cent level of confidence; 
in fact, chance alone could have brought about 
such a great difference in only 14 cases out of 
10,000. 


Summary 


The problem of the study was to discover 
how well satisfied male graduates of the College 
of Liberal Arts of Northwestern University 
were with their occupations and what factors 
were related to this feeling of satisfaction. 

The working sample consisted of 229 full- 
time male workers of the classes of 1927-1928 
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(Gp. I), 1937-1938 (Gp. IT), and 1948 (Gp. 
III). 

A questionnaire was employed as the data- 
gathering technique. The criterion of job 
satisfaction which was employed was a seven- 
point continuum. Graduates were asked to 
subscribe to one point only on the continuum, 
and arithmetical values were subsequently 
used to quantify the data. 

The results indicated that graduates of the 
College were essentially satisfied with their 
jobs but that the amount of satisfaction was 
dependent in varying degrees on the factors of: 
(1) time on the job, (2) salary, (3) status on 
the job, (4) age, (5) marital status, (6) re- 
tirement plans, (7) campus _belongingness, 
and (8) the affinity between job duties and 
college major. Job satisfaction was shown to 
have no relationship with college grades and 
only a limited relationship with job categories. 

Graduates of the sample were basically 
satisfied with their occupations as evidenced 
by the fact that 175 of the 229 full-time 
workers subscribed to the highest two items 
on the seven-point continuum scale of job 
satisfaction. These 175 persons identified 
themselves as completely or well satisfied. 
Only three persons subscribed to the two low- 
est continuum ratings of “very dissatisfied” 
and “completely dissatisfied.” 

Job satisfaction, however, was shown to be 
related to certain job, personal, and campus 
factors. Rather than being discrete, job satis- 
faction appears to be a part of a general pattern 
of the personality, interests and experiences of 
the individuals who were surveyed. 


Received August 14, 1950. 
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A Note on Interviewer Bias * 


Gardner Lindzey 


Harvard University 


Most sources of indeterminism in psycho- 
logical science have been little explored in 
spite of their crucial importance for empirical 
research. One exception to this generalization 
is the investigation of interviewer bias. This 
parameter has been much studied and a few of 
these studies have supplied highly pertinent 
information concerning the conditions sur- 
rounding its operation (1, 2, 3, 5, 6, 7, 9). 

Certainly the most dramatic demonstration 
of interviewer bias and a study that suggests 
itself as a baseline for further investigation, is 
the experiment reported by Stanton and Baker 
(8). These investigators exposed “meaning- 
less’”’ geometric figures to 200 students in a 
classroom setting. At varying intervals fol- 
lowing this period five professionally trained 
interviewers asked each of the subjects to 
select from pairs of figures the ones they had 
seen in the classroom exposure. In each case 
one of the figures was the one originally ex- 
posed and the other was its mirror image. 
Previous to the interviews each interviewer 
was given a code sheet containing the “cor- 
rect” responses for each of the pairs of figures. 
Actually the code sheets had been arranged so 
that one half of the cues provided the inter- 
viewer were correct and the other half were in- 
correct. The results of this study showed a 
highly significant tendency on che part of the 
interviewers to report more correct responses 
in the cases where their code sheets provided 
them with a “correct” cue than in the cases 
where the code sheet provided them with an 
“incorrect” cue. That is, when the key gave 
as the correct response the figure actually 
exposed in class, there was a higher incidence 
of correct responses than when the key indi- 
cated that the figure actually exposed in class 
was the wrong alternative. Thus, although 
the interviewers were experienced, had been 
specifically cautioned against biasing the data, 
and although their task consisted merely of 
presenting cards to the subjects and asking for 


* This research was made possible by a grant from the 
Laboratory of Social Relations, Harvard University. 


a simple discrimination, the biasing informa- 
tion they were given apparently produced a 
definite shift in the interviewee responses they 
reported. Friedman (4) repeated this study 
under very similar circumstances and failed, 
however, to confirm the findings of the earlier 
investigators. 

The present study attempted to provide a 
third test of the relationship reported by Stan- 
ton and Baker and at the same time extended 
the experimental procedure so that if inter- 
viewer bias was observed, further statements 
could be made as to its mediating factors. 


Procedure 


Subjects. The subjects in this study were 
85 students of Harvard and Radcliffe Colleges 
who were enrolled in an undergraduate social 
science course. 

Administration of Stimulus Material. Dur- 
ing a regular class period the twelve geometric 
figures employed by Stanton and Baker and 
reproduced in their article (8, p. 381) were 
shown to the subjects by means of a projector. 
Four additional figures were projected upon 
the screen following the standard twelve. The 
results for these four figures will not be dis- 
cussed in this article. The students were told 
only that this was an experiment in a kind of 
symbolic learning and that they would be 
asked after watching one series of exposures to 
identify in a second series those pictures that 
were exposed in the same serial order in the 
first and second series. After the second ex- 
posure they were asked to volunteer for a 
further experiment that was to be conducted 
four days later. Approximately one half of the 
class appeared for the second half of the experi- 
ment, at which time they were individually 
shown a series of cards on which there were 
line-drawing reproductions of the geometric 
figures they had originally seen. Each figure 
was paired with its mirror-image. Position of 
the original figure and its mirror-image was 
systematically varied among the packs of cards 
used by the different interviewers. The sub- 
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jects were asked to select from each pair the 
figure they had actually seen in class. 

Interviewers. The interviewers in this study 
were eleven graduate students all of whom had 
had experience in interviewing and a number of 
whom had taken an advanced course in inter- 
viewing methods. The interviewers were 
given an instruction sheet outlining specifi- 
cally what they were to do and emphasizing the 
importance of avoiding any form of interviewer 
bias. In addition they were cautioned verb- 
ally against biasing the data they collected, 
through their knowledge of what was the cor- 
rect response. They were also given “doc- 
tored” code sheets which together with the 
instruction sheets were a replication of the 
forms reported in Stanton and Baker’s article. 
The task of each interviewer was to present 
the stimulus cards, request the discrimination 
and record the response on the code sheet, 
where the “correct” response for each item was 
placed conspicuously. As in the case of the 
original experiment, half of the “correct” re- 
sponses indicated on the code sheet corre- 
sponded to the figure actually shown in class 
and the other half did not. 


Results and Discussion 


The main question to be examined in this 
study was whether the interviewers reported 
systematically different responses when their 


Table 1 


Accuracy of Responses with Correct and 
Incorrect Interviewer Cues* 








Per Cent 
Per Cent Incorrect 


Figure Correct Cue Cue 


72% 57% 
44 49 
64 67 
46 57 
66 72 
70 56 14% 
49 47 2% 
56 1% 
60 54 6% 
10 59 70 -11% 
11 56 40 16% 
12 49 47 2% 


Difference 


15% 
—5% 
—3% 

~11% 
—6% 





CONaurwne 





*24 = 1.67; 4 = .61; p > .50. 
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code sheet gave a “correct” cue than they did 
when it gave an “incorrect” cue. That is, 
was the percentage of correct responses re- 
ported higher when the cue provided was in 
agreement with what the subjects had actually © 
witnessed in class or did provision of this in- 
formation have no effect upon accuracy? The 
results reported in Table 1 indicate that, al- 
though there was a slight difference in accuracy 
between the correctly and incorrectly coded 
responses, this difference was not of sufficient 
magnitude to warrant rejection of the null- 
hypothesis. This finding, coupled with Fried- 
man’s, casts into serious question the relation- 
ship initially reported by Stanton and Baker. 
Only one casual observation suggests a pos- 
sible means of accounting for the difference be- 
tween the findings of the investigations in 
question. In the present study considerable 
emphasis was placed upon the importance of 
preventing the interviewee from seeing the code 
sheet of the interviewer and all interviewers 
were cautioned to prevent this. In addition, 
each interviewer was given a stack of books to 
be used in protecting his record sheet from the 
subject. In spite of this, one interviewer re- 


ported that it seemed to him that the subjects 


could see his record sheet. None of the other 
interviewers reported this, and upon question- 
ing they did not think that their subjects had 
been able to see the record sheet. Examina- 
tion of the data reported by the deviant inter- 
viewer revealed that his interview material 
showed much more tendency in the direction 
reported by Stanton and Baker than the data 
of any of the other interviewers. Thus, re- 
moving his responses from the distribution re- 
sulted in the data reported in Table 2. This 
somewhat fragmentary observation suggests 
that if similar conditions favoring subject- 
copying had operated for the other interviewers 
we would have secured results approximating 
those of the earlier study. This finding, 
coupled with the fact that Stanton and Baker, 
aside from a casual written admonition to the 
interviewers, do not report any special efforts 
to prevent subject-copying suggests that this 
might account for the difference in the findings 
of the two studies. We can be sure only that 
our study has failed to demonstrate the rela- 
tionship at question and has suggested one 
means whereby such a relationship might be 











ing i EL PEAS OIE 


ee ens 


fixer 


Gardner Lindzey 


Table 2 


Accuracy of Responses with Correct and 
Incorrect Interviewer Cues Omitting 
Deviant Interviewer* 





Per Cent 
Incorrect 
Cue 


Per Cent 


Figure Correct Cue Difference 





56% 16% 
—12% 
—4% 
—8% 
—9% 
12% 
5% 
5% 
5% 
-17% 
15% 
pon 2% 


72% 


CONAN E WN 
SSERV EIS 


= 


on > 
OD 


10 
11 
12 





*24 = .50;¢ = .16; p > .SO. 


demonstrated as a result of inadequate experi- 
mental control. 

At several points our procedure departed 
from the procedure employed in the original 
experiment. First, our subjects were volun- 


teers and represented only a little over one 


half of the persons to whom the stimuli were 
initially exposed. This was not true of the 
original experiment where apparently almost 
every person exposed to the stimuli was inter- 
viewed. Second, we added four slides at the 
end of the original presentation that had not 
been employed in the original experiment. 
Third, there were obvious differences in the 
interviewers, subjects and social and temporal 
setting of this experiment and the first one. 


We see no reason to expect any of these vari- 
ables to show sufficient co-variance with inter- 
viewer bias to account for the difference in 
Stanton and Baker’s results and our own. 
Nevertheless they exist as possible contribu- 
tors to this difference. 

In conclusion, we have failed to demonstrate 
the operation of interviewer bias under condi- 
tions closely similar to those under which 
Stanton and Baker found evidence for it. Our 
results, together with the results of Friedman, 
suggest that the finding originally reported 
should not be considered demonstrated without 
further positive evidence. 


Received August 4, 1950. 
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Differences between Volunteers and Non-Volunteers 
for Psychological Studies 


Ephraim Rosen 


University of Minnesota 


Innumerable psychological and sociological 
studies have made use of volunteer subjects in 
the belief, apparently, that adequate coopera- 
tion of non-volunteer subjects cannot be 
secured for research into many areas of human 
behavior. Studies of sexual behavior, marri- 
age, fantasy, religion, of any area, in fact, 
touching on highly ego-involved and defensive 
attitudes and beliefs, have usually depended 
upon volunteer subjects. 

Yet few studies have attempted the method- 
ologically relevant task of exploring the nature 
of volunteers per se. Little or no work has 
been done on such problems as the reliability 
of volunteering behavior, the differences be- 
tween volunteers and non-volunteers, and the 
consistency of such differences in a variety of 
conditions under which individuals volunteer. 

Previous studies are quickly summarized. 
Norman (8) reviews research dealing with dif- 
ferences between respondents and non-re- 
spondents to mailed questionnaires. He finds 
that such research has generally reported re- 
spondents to be more ego-involved in the area 
investigated by the questionnaire, more intel- 
ligent, more articulate, better educated, and 
more likely to be members of medium-income 
groups, than non-respondents. Questionnaire 
respondents do not, of course, constitute a 
category identical with volunteers for a psy- 
chological experiment or test; nor are non- 
respondents equivalent to non-volunteers. 
The problem of differences between those who 
do and do not respond to mailed questionnaires 
is, however, parallel to that of differences be- 
tween those who do and do not volunteer for an 
experiment. 

Wallin (10) reports that engaged couples 
who volunteered for a study of factors associ- 
ated with future marital success differed sig- 
nificantly from both non-volunteers and the 
total sample of volunteers and non-volunteers 
in likelihood of a successful marriage. Several 
other differences, not statistically reliable, are 
reported: volunteers were better educated, less 


conservative politically, younger, less likely 
to be Catholic, and better poised. 

In a study of visuo-motor conflict learning, 
Brower (2) reports that volunteers made 
fewer errors than non-volunteers, and took 
more time in two out of three experimental 
tasks. 

Kinsey (6) finds that male volunteers for 
interviews in the area of sexual behavior re- 
ported a greater frequency of total sexual out- 
let, and of all individual outlets except noc- 
turnal emission, than male non-volunteers. 
As reasons for these differences, Kinsey hy- 
pothesizes that volunteers may have been more 
active and aggressive, more cooperative in re- 
sponse to the survey, less inhibited sexually, 
and possibly less likely to have had socially 
taboo items, such as pre-marital intercourse 
and homosexual contacts, in their past his- 
tories. 

Maslow (7) reports that female volunteers 
for an inquiry into sexual attitudes and be- 
havior scored higher than non-volunteers on 
dominance rating, had more sexua! experience 
prior to marriage, and were far less likely to 
have attitudes of rejection toward sexuality. 

It is clear that the results of previous studies 
of differences between volunteers and non-vol- 
unteers are neither wholly consistent nor wholly 
inconsistent with each other. The possibility 
exists that volunteering shows a partial con- 
sistency in a variety of situations and a partial 
specificity, as a function. of situational differ- 
ences. 


Purpose of the Present Research 


The present research was designed to test 
the following hypotheses: 


1. Volunteering is not wholly a function of 
the specific situation, but is in part a function 
of stable attitudes and personality character- 
istics. 

2. Volunteering behavior should, in conse- 
quence, display a fair degree of reliability. 
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3. Personality and attitude differences be- 
tween volunteers and non-volunteers should 
be demonstrable in a variety of dimensions 
such as defensiveness, anxiousness, dominance, 
psychological-mindedness, and authoritarian- 
ism. The direction of these differences was 
difficult to predict: the author’s predictions of 
direction were borne out in some cases and not 
in others. 

4. The differences should be more marked 
on the variables of personality and attitudes 
than on more objective, “sociological” vari- 
ables such as economic status. This hypothe- 
sis arises from the general assumption that 
volunteering is more directly a function of 
personality and attitude characteristics than 
of the sociological concomitants of these psy- 
chological characteristics. 


Procedure 


Two independent studies were carried out to 
test these hypotheses. 

1. In the fall of 1948 a battery of tests was 
administered to all freshmen entering the 
General College of the University of Minne- 
sota. The students were then told that they 


could take the Minnesota Multiphasic Per- 
sonality Inventory (MMPI) if they wished to. 


Of the 656 freshmeh, 410 (62.5 per cent) chose 
to take it. 

In 1949, following administration of the 
battery, the freshmen were told that taking the 
MMPI was a routine, scheduled activity. 
Consequently a 95 per sent sample was ob- 
tained. 

For these two groups, the Student Counsel- 
ing Bureau of the University made available 
scores on the MMPI, scores on the 1938 re- 
vision of the Strong Vocational Interest Blank, 
and percentile ranks for scholastic standing in 
high schools previously attended. A sample 
of 70 male and 65 female MMPI profiles was 
randomly selected from the volunteer group; 
and a sample of 115 male and 86 female pro- 
files was randomly selected from the 1949 95 
per cent group. The number of subjects was 
slightly smaller for Strong Blank scores and 
for high school rank, since these items of in- 
formation were not available for all subjects. 

On the MMPI, scores were recorded for all 
subjects on the standard scales—Question, L, 
F, K, Hs, D, Hy, Pd, Pa, Pt, Sc, and Ma (5)— 
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and for most subjects on the SIE scale (4). 
Scores were appropriately corrected for K. 
For both this study and the second study de- 
scribed below, the following criteria were re- 
quired for inclusion of any MMPI profile in the 
sample: Raw score on the F scale not greater 
than 15 (T score less than 80); T score on the 
L scale not greater than 70; and raw score on 
the F scale not more than nine points greater 
than raw score on the K scale. In addition, 
no subject had a T score on the Question scale 
greater than 57. 

2. In the second study, students of four 
sections of an elementary Psychology Labora- 
tory class were asked by their respective in- 
structors to voluntver for “‘a personality experi- 
ment.”’ The vague formulation, “a personal- 
ity experiment,” was used to make the situation 
comparable to, yet not identical with, the per- 
sonality test situation of the first study. 
Sixty-two volunteers, 30 males and 32 females, 
were thus obtained. 

The 30 items of the F (fascism) scale of 
Form 45 of the Berkeley Public Opinion Study 
opinion-attitude scale (1) were administered to 
these subjects. This scale contains such items 
as: 


“Obedience and respect for authority are the 
most important virtues children should learn.” 

“People can be divided into two distinct 
classes: the weak and the strong.” 

“Familiarity breeds contempt.” 

To these 30 items were added four items de- 
signed to measure the subjects’ attitudes to- 
ward Psychology. These items are: 

“Psychology is a rewarding subject for 
study.” 

“The trouble with psychology is that it is not 
practical enough.”’ 

“Psychology courses are highly enjoyable.” 

“Participating in psychological experiments, 
such as this one, is a nuisance.” 


In addition, subjects were requested to 
answer a series of face-sheet items, slightly 
modified from the original face-sheet used in 
the Berkeley study; and to answer a number of 
open-ended questions, not analyzed in the 
present report. A record was kept of the time, 
in minutes, each subject took to complete the 
total questionnaire. 
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Two days after the completion of face-sheet 
and questionnaire, the four sections were in- 
formed that not enough students had volun- 
teered. The volunteers were excused from 
class and the remaining students were utilized 
as non-volunteer subjects. By this method 
72 non-volunteers were obtained, 33 males and 
39 females. The procedures used with the 
volunteers were then repeated for these non- 
volunteers. 

One week later, the instructors of the four 
sections administered the MMPI to all stu- 
dents. The inventory was presented to them 
as part of the course work, in which a number 
of other tests had already been demonstrated 
and administered although none had been in the 
area of personality. Subjects thus did not 
know the MMPI was connected with the ex- 
periment. 

Three additional measurés were available in 
the second study. 


a. For one of the four sections a reliability 
measure was obtained by comparing volun- 
teers for this study with volunteers for a later 
investigation. In this later research the ap- 


peal for volunteers was not made by the sec- 


tion instructor, and both the incentives offered 
for volunteering and the stated purpose of the 
investigation differed radically from those of 
the present study. 

b. A count was kept of the volunteers and 
non-volunteers who asked the investigator for 
information on their questionnaire results. 

c. Similarly, a count was kept of subjects 
who came to their section instructors for 
MMPI scores and interpretations. 


In summary, the second study replicated the 
MMPI measures of the General College study 
and added a series of other measures not avail- 
able in the General College study. ‘he 
Strong Blank was not administered in the 
second study since it seemed that more prof- 
itable dimensions of personality and attitudes 
could be investigated in the short time subjects 
were available, especially in view of the largely 
negative results of the Strong measures for the 
General College subjects (to be described be- 
low). High-school rank was not obtainable 
in the second study; as a substitute report of 
accumulated grade-point average was re- 
quested on the face-sheet. 


Analysis of Data 


Comparisons were made separately for males 
and for females, and for males and females 
combined where such combination was mean- 
ingful. On the MMPI, for example, scores on 
all scales except the Mf scale were compared 
for both sexes combined as well as for the 
separate sexes. 

1. In the General College study, T-score 
means and standard deviations of the MMPI 
scale scores were computed for the 1948 and 
1949 groups, and significance of difference of 
means was evaluated by the / statistic. These 
statistics were not computed for the Question, 
L, and F scales, since T-scores on these scales 
had been recorded as 50 for all individuals 
whose raw scores were translatable into a 
T-score of 50 or less. Statistics for these three 
scales would consequently be ambiguous in 
meaning. 

Strong Blank (9) scores for males were 
analyzed for primary interest patterns accord- 
ing to the technique described by Darley (3). 
The Chi? technique was used to evaluate pres- 
ence vs. absence of a primary interest pattern 
in the 1948 vs. 1949 male subjects. 

; For the female subjects, Strong Blank scores 
were recorded for all occupations in which the 
subjects had an “A” rating. Using the Chi? 
technique comparisons were made to investi- 
gate the possibility of differences in presence 
or absence of an “A” rating on any occupa- 
tion; feminine, non-professional occupations; 
and on the specific occupation of “housewife.” 

High-school rank means were compared by 
the / statistic. 

2. In the second study MMPI scores were 
compared as in the General College study, with 
the exception that comparisons of L and F 
scores were possible in raw score form. 

On the F scale, subjects indicated their re- 
sponses by marking a number from +3 (“I 
agree very much”) to —3 (“I disagree very 
much”), the zero category not being permitted, 
for each item statement. By use of ¢, com- 
parisons were made of algebraic total score 
means, time spent on the total questionnaire, 
and responses to those face-sheet items which 
were quantitative in nature. 

The Chi? statistic was used to compare re- 
sponses of volunteers and non-volunteers to 
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each of the four statements dealing with at- 
titudes to Psychology, responses to non- 
quantitative or semi-quantitative face-sheet 
items, reliability of volunteering behavior, 
frequency of asking for Questionnaire inter- 
pretation, and frequency of asking for MMPI 
interpretation. 


Results 


1. Means and standard deviations of MMPI 
scores of the General College 1948 volunteers 
and 1949 95 per cent group, for males, females, 
and for both sexes combined, are presented in 
Table 1. None of the differences were numeri- 
cally large. The appropriate columns of 
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groups in frequency of presence of an “A” 
rating on at least one occupation. The 1949 
female subjects had a significantly greater fre- 
quency of “A” ratings on feminine occupations 
than 1948 female volunteers, the level of con- 
fidence being 5 per cent. At the weak level of 
15 per cent, they also had a greater frequency of 
“A” ratings on the occupation of “house-wife.” 

Differences between mean high-school ranks 
of the two General College groups were not 
significant for males, females, nor both sexes 
combined. 

2. Means and standard deviations of MMPI 
scores of the volunteers for the “psychological 
experiment” are presented in Table 2. As in 


Table 1 : 
Means and Standard Deviations of General College Groups on MMPI Scores 





Male 


Female 


Both Sexes 





1948 Vol- 
unteers 


1948 Vol- 
unteers 


1949 95% 
Group 


1949 95% 
Group 


1948 Vol- 1949 95% 
unteers Group 





M SD M SD M SD 


M M SD SD 





8.12 
7.83 
10.76 
7.30 
9.26 
10.17 
8.59 
10.26 
11.41 
10.99 
9.37 


52.75 
49.52 
48.18 
52.23 
57.51 
52.84 
52.09 
55.50 
56.72 
63.56 
48.52 


9.07 
7.17 
8.40 
7.99 
9.33 
9.60 
8.74 
11.00 
11.53 
11.14 
7.28 


55.05 
48.03 
51.38 
52.69 
55.52 
50.29 
52.83 
55.35 
56.20 
56.57 
50.44 


8.56 
6.20 
7.14 
7.01 
8.82 
8.27 
8.29 
7.77 
8.39 
8.97 
7.74 


50.39 
52.66 
54.24 
60.41 
55.37 
53.33 
58.41 
58.63 
61.90 


SIE 48.25 


57.30 
49.23 
48.81 
54.48 
57.36 
51.51 
51.77 
53.94 
55.13 
58.13 
50.15 


54.76 
49.25 
52.04 
53.49 
58.06 
53.09 
56.94 
57.46 
59.33 
49.34 


8.34 
7.19 
9.22 
7.20 
9.37 
8.45 
9.28 
10.15 
10.42 
8.66 


9.18 
7.26 
8.55 
8.10 
9.50 
8.06 
9.94 

10.23 

11.06 
8.05 


8.09 
10.15 
8.82 





Table 3 summarize the comparisons inate in 
Table 1. A “+” indicates that the 1948 vol- 
unteers had a higher scale mean than the 1949 
95 per cent group, a “—” that the volunteers 
had a lower mean. The level of confidence of 


_ each difference is shown in parentheses after 
¢ each “4 or it Se 


Of 32 comparisons made, 
one difference was significant at the 0.1 per cent 


’ level, one at the 1 per cent level, 3 at the 5 per 


cent level, and 7 at various poorer levels, al- 
though all of these differences were numeri- 
cally small. 

Strong Blank score comparisons revealed no 
significant differences between the two General 
College male groups in frequency of presence of 
a primary pattern, nor between the female 


the General College study the differences were 
not numerically large. The appropriate col- 
umns of Table 3 summarize these comparisons, 
by indicating direction and level of signifi- 
cance of differences. A “+” indicates that 
volunteers had a higher mean, a “—” a lower 
mean. Of 39 comparisons made, one differ- 
ence was significant at the 1 per cent level, 
two at the 2 per cent level, 7 at the 5 per cent 
level, 1 at just short of 5 per cent, and 5 at 
poorer levels. 

Table 3 also compares differences found be- 
tween the Genera! College groups with differ- 
ences found between the “psychological ex- 
periment” groups. The purpose of this com- 
parison was to provide a cross-validation of 
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Table 2 
Means and Standard Deviations of Groups in Second Study on MMPI Scores 





Male 


Female 


Both Sexes 





Non-Volun- 


Volunteers teers 


Volunteers 


Non-Volun- 
teers 


Non-Volun- 


teers Volunteers 





SD M 


M SD M 





M SD M SD 





2.50 
K 62.64 
F* 3.96 
Hs 48.92 
D 55.18 
Hy 56.07 
Pd 63.29 
Mf 64.36 
Pa 50.18 
Pt 59.36 
Sc 57.04 
Ma 59.14 56.94 9.01 53.52 
SIE 44.36 45.71 7.91 47.62 


1.66 
5.76 
2.66 
5.97 
10.03 
7.41 
12.08 
11.10 
7.03 
10.59 
8.93 
10.17 
6.46 


3.10 
59.00 

3.52 
$1.23 
49.74 
56.84 
57.23 
59.16 
50.77 
53.42 
55.61 


2.48 
6.92 
2.07 
5.67 
10.17 
5.70 
7.43 
9.74 
6.59 
8.14 
7.44 


3.38 
61.31 

2.38 
48.07 
48.62 
53.59 
53.03 
46.31 
53.28 
54.10 
54.90 


3.11 
58.97 

3.03 
49.93 
48.29 
55.73 
55.59 
50.17 
51.96 
54.99 
57.61 
46.91 


2.06 
7.43 
2.25 
5.71 
8.34 
6.45 


61.96 

3.16 
48.49 
51.84 
54.81 


6.82 
2.51 
5.26 
9.53 
6.57 





*L and F scale scores are in raw score form. 


the statistical differences found in the two 
studies. Such a check is especially necessary 
in this research since the 1949 group was not 
“pure” but consisted of non-volunteers and of 
those who would have volunteered for the 
MMPI if asked to. Another reason for the 
desirability of replication is that differences 
between the two General College groups could 
conceivably be due to the fact that the groups 
were drawn from separate populations who 
came to the University one year apart. 


The comparison shows that of the 12 differ- 
ences at 20 per cent or better found between 
General College groups for males, females, or 
both sexes combined, 7 were substantiated at 
5 per cent or better in the second study. Of 
16 differences at 15 per cent or better found 
between the “psychological experiment” 
groups, 7 were substantiated in the General 
College study at 20 per cent or better. Refer- 
ence to Table 1 shows that 7 of the unsub- 
stantiated differences had the same direction 


Table 3 
Comparison of Groups on MMPI Scores in the Two Studies 





Male 





Female 


Both Sexes 





Second 
Study 


* First 
Study 


First 
Study 





Second 
Study 


First 
Study 





+(20%) 


+(1%) 
+(10%) 
+(5%) 
+(10%) 


+(10%) 


+(5%) 
— (15%) 
+(5%) 


— (15%) 


+(5%) 
— (15%) 
+(5%) 

+(6%) 


+(2%) 


+(2%) 

— (15%) 
+(0.1%) +(5%) 
+(15%) 
— (10%) _ 
+(S%) 
+(5%) 


— (5%) 


+(5%) +(1%) 


— (10%) 
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of difference, and 2 had opposite directions. 
The use of significance levels no better than 
20 per cent seems, therefore, to be justified in 
this investigation. Further, there is evidence 
of considerable consistency of differences be- 
tween volunteers and non-volunteers in non- 
identical, though not completely dissimilar, 
situations. 

The indications from Table 3 are that volun- 
teers, as compared to non-volunteers, show 
some tendencies toward higher scores on the 
D, Pt, and K scales, higher Pd and Mf scores 
among males only, and higher scores on Pa and 
lower scores on Ma among females only. 

Turning now to the measures available in the 
second study only, Table 4 presents compari- 


Table 4 


Comparison of Mean Total Score of Volunteers and 
Non-Volunteers on F Scale of Questionnaire 
in the Second Study 








Both 


Male Female Sexes 





— 30.63 
20.42 


Volunteer M — 24.40 
SD 18.39 


Non-Volunteer M — 12.30 
SD 18.68 
Difference 12.10 
t 2.59 

P 1% 


— 36.47 
20.51 


— 13.85 
14.09 
22.62 

5.30 
001% 


~13.14 
16.37 
17.49 
5.41 
001% 





sons of volunteers and non-volunteers on the 
30 items of the F scale. Male and female 
volunteers evidenced significantly less of the 
conventionalism, authoritarianism, power pre- 
occupation, and tendency to projection meas- 
ured by this scale. Volunteers took longer to 
complete the questionnaire. The time differ- 
ences were significant at the .001 per cent level 
for males, females, and both sexes combined. 
On three of the four items dealing with atti- 
tudes toward Psychology, volunteers differed 
significantly from non-volunteers. Volunteers 
expressed themselves as finding Psychology 
more rewarding and more enjoyable than non- 
volunteers, and objected to experiments of the 
present kind much less. Chi’’s for these 3 
items were significant at the 6, 5, and .001 per 
cent levels, respectively. Volunteers did not 
criticize Psychology as being impractical as 
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often as did non-volunteers but the difference 
was not significant. 

Volunteers differed significantly from non- 
volunteers on a small number of the question- 
naire face-sheet items. Female volunteers 
were younger than female non-volunteers at the 
1 per cent level of confidence, had a higher 
grade-point average at the 10 per cent level, 
expected a lower income in 10 years at the 6 
per cent level, and reported father’s income to 
be lower at 5 per cent. Neither males alone, 
nor the total of males and females, showed sig- 


_nificant differences on any of these four items. 


Less frequent attendance at church services 
was reported by volunteers, the Chi? being 
significant at 6 per cent. There were more 
Jews among volunteers than among non- 
volunteers; the Chi? was significant at 15 per 
cent. 

For all other distributions of the face-sheet 
variables no significant differences were found. 
Volunteers did not differ from non-volunteers 
in proportion of males and females constitut- 
ing the groups; in frequency of usage of a code 
to identify their own questionnaires;! in fre- 
quency of urban vs. rural residence; in number 
of male veterans; in political preferences; or in 
grades in the previous quarter’s Psychology 
course. 

Volunteering behavior proved to be meas- 
urably reliable as inferred from the comparison 
of volunteers from one section with volunteers 
for a later investigation. The Chi? for this dis- 
tribution was significant at the 5 per cent level. 

There was a significant relationship between 
volunteering and asking for individual inter- 
pretation of questionnaire results. The Chi* 
was significant at .001 percent. The distribu- 
tion of relation between volunteering and asking 
for MMPI interpretation was significant at 
only the 10 per cent level. It will be recalled 
that, as subjects saw it, there was no connection 
between the “psychological experiment” and 
the MMPI. Despite attenuation of the Chi’, 
probably due to this apparent lack of connec- 
tion, there seems to have been a relationship 


' Subjects had been given a choice of writing names 
or private code identifications on their questionnaires. 
There was no difficulty in learning the names of those 
subjects who used code-designations since at any indi- 
vidual experimental hour the number of subjects was 
— and these could be checked against sign-up 
sheets. 
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between readiness to volunteer and interest in 
own personality. 


Interpretation of Results 


In interpreting the statistical results two pro- 
cedures will be followed. Greater weight will 
be given to the second study since its two 
groups were “pure” whereas the 1949 General 
College group included both volunteers and 
non-volunteers. Secondly, it will be assumed 
that the unreplicated results of the second 
study are as valid as the replicated MMPI 
findings. This procedure seems justified by 
the consistency, by and large, of MMPI with 
other results. 

The statistical results indicate that volun- 
teers showed some tendencies to feeling and 
admitting more discouragements, anxieties, 
and inadequacies than non-volunteers (D and 
Pt scales). They had a tendency to the kind 


of defensiveness measured by the K scale, 
though it is doubtful if this result indicates a 
deep-seated unconscious defensiveness. Lesser 
fascist-mindedness was indicated by lower 
scores on the F scale; it may be asserted that 
volunteers were less prone to the conventional- 


ity, authoritarianism, preoccupation with 
power and pseudo-toughness, and projectivity 
measured by the F scale. Both the F scale 
comparisons, and the comparisons of scores on 
the items dealing with attitudes toward Psy- 
chology, provide evidence for inferring that 
volunteers were more intraceptive and psy- 
chological-minded. 

The longer time taken by volunteers to com- 
plete the questionnaire seems to point to 
greater seriousness and perhaps greater enjoy- 
ment in filling out the questionnaire; this in- 
terpretation is consistent with the hypothesis 
of volunteers’ psychological-mindedness. It 
would seem that volunteers were more ego- 
involved in the experiment, due either to their 
personality characteristics, or the act of volun- 
teering itself, or to a combination of both 
factors. This ego-involvement is pointed to 
by the time difference, and also by the interest 
in own personality displayed in requesting 
questionnaire and MMPI interpretation. 

Volunteers were less prone to church at- 
tendance, again indicating a lesser degree of 
conventionality than non-volunteers. 
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On certain variables, male and female sub- 
jects did not show identical differences. Thus, 
male volunteers showed somewhat higher 
femininity and Pd scores than non-volunteers, 
possibly indicating a tendency to esthetic- 
mindedness, on the one hand, and individual- 
istic non-conventionality, on the other hand. 
This is an interpretation in accord with the 
intraception and low F scale scores of volun- 
teers. 

Female volunteers, on the contrary, showed 
slightly higher scores than non-volunteers on 
Pa, and lower on Ma. Interpretation of this 
finding must remain extremely tentative: 
possibly the Pa scale difference indicates 
dominance and aggressiveness among female 
volunteers. The lower Ma score may perhaps 
indicate serious-mindedness and _ steadiness. 
This interpretation finds some support in the 
higher grade-point average reported by female 
volunteers. 

A number of measures did not differentiate 
volunteers from non-volunteers: they did not 
differ significantly, for example, on the Hs, Hy, 
nor Sc scales of the MMPI. Interestingly 
enough, they also did not differ on the SIE 
scale. Motives for volunteering in the situ- 
ations utilized in this research were apparently 
not identical with motives accounting for 
participation in interpersonal relationships 
and social situations. 

Vocational interests did not differentiate the 
two groups in the first study to any consider- 
able extent. It was anticipated that General 
College volunteers might show less certainty 
than non-volunteers in occupational choice, 
and accordingly less channelization of voca- 
tional interests, but this expectation was not 
borne out. The only difference found on the 
Strong Blank was the tendency of the 1949 
General College female group to show more 
feminine occupational interests than 1948 
female volunteers. The meaning of this 
difference is not clear. 

Similarly, neither high-school rank nor 
grades in the previous quarter’s Psychology 
course differentiated the groups. It would 
seem that volunteering for a situation involv- 
ing a personality test or experiment is not 
necessarily associated with feelings of success 
in scholastic tests in general. 








192 


The lack of differentiation on a large num- 
ber of objective or semi-objective face-sheet 
variables seems to bear out the hypothesis that 
volunteers differ from non-volunteexs on psy- 
chological variables to a greater extent than 
they do on “sociological” variables. The few 
face-sheet differences found, such as difference 
in frequency of church attendance, were con- 
sistent, in the main, with differences found by 
other measuring instruments. In general, 
however, the number of face-sheet differences 
were scarcely greater than chance expectation. 

In sum, volunteers seem to have differed 
from non-volunteers on a number of measures 
which perhaps form a cluster which may be 
termed “intraceptive non-conventionality.” 
The specific differences found seem largely to 
be various facets of this core characterization. 
The measures on which volunteers did not 
differ from non-volunteers were, in the main, 
not aspects of this cluster. 


Implications 


The differences found between volunteers 
and non-volunteers in this research seem to be 
particularly applicable to studies using volun- 
teer subjects in which personality or attitudes, 
or activities closely related to these, are the 
object of inquiry. It may be concluded that 
utilization of volunteer subjects affects the re- 
sults of such studies, though the direction of 
effect is a function of the specific study as well 
as of general differences between volunteers 
and non-volunteers. Of course, a random 
non-volunteer sample is always preferable, but 
for certain studies, e.g., many clinical inquiries, 
dependence on highly cooperative volunteers 
seems inevitable. The results found in this 
research indicate that such volunteers may 
constitute specialized groups, differing from 
the general population in somewhat predictable 
ways. Further check of the differences from 
the general population, in a variety of di- 
mensions and a variety of situations, would 
clarify the effects of the methodological use of 
volunteer subjects. 


Summary 


The purpose of this research was to investi- 
gate the presence of consistent personality and 
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attitude differences between volunteers and 
non-volunteers, and the reliability of volun- 
teering behavior. 

In two separate situations, volunteers were 
compared with non-volunteers on a number of 
psychometric and face-sheet variables. In 
terms of the specific instruments used, it was 
found that volunteers showed a greater tend- 
ency than non-volunteers to admission of dis- 
couragements, anxieties, and inadequacies, 
and, at the same time, some tendency toward 
defensiveness. Volunteers were less fascist- 
minded than non-volunteers, more intraceptive 
and psychological-minded, and less conven- 
tional. 

Male volunteers tended toward greater 
femininity of interests; female volunteers 
showed a higher degree of serious-mindedness 
than non-volunteers. 

A large group of measures did not differenti- 
ate volunteers from non-volunteers: grades, 
degree of vocational interest channelization, 
and self-report of a number of variables, largely 
sociological in nature. 

The findings have implications for conclu- 
sions drawn from studies based on volunteer 
subjects, particularly in the areas of personality 
and attitudes. Volunteers seem to differ from 
non-volunteers in dimensions which could 
easily limit the validity of generalizations that 
can be made from volunteer-based research, 
unless a correction for sampling bias is made. 


Received July 27, 1950. 
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Studies in Item Analysis 2: Effects of Various 
Methods upon Test Reliability * 


Jerome H. Ely** 
Occupational Research Center, Purdue University 


An integral step in test construction is the 
selection of items to be included in the final 
form of the test. This can be based entirely 
upon a subjective evaluation, but usually it 
includes the use of some method of item analy- 
sis. Each such method has the same purpose: 
to select those items which can be combined 
into a test of maximum validity. 

Many different methods of item analysis 
have been described in the literature (1, 13). 
Most are similar, but each involves different 
statistical manipulations. Which then should 
the test constructor use? The general aim of 
this study has been to compare four of them in 
an attempt to answer this question. 


The Problem 


Purpose of Study: The primary purpose of 
this study was to compare four different 
methods of item analysis under varying condi- 


tions, using test reliability as the criterion for 
evaluation. The four major hypotheses tested 
were: 


1. Method of item analysis does not affect 
test reliability. oe 

2. Size of criterion groups' does not affect 
test reliability. 

3. Test length does not affect test reliability. 

4. Interactions among the three main vari- 
ables (length X method, _ length X size, 
method X size, length X method X size) do 
not affect test reliability. 


A secondary purpose was to measure the 
amount of over-lapping between items selected 
by these four methods. 


* This study is based upon a thesis submitted in 
partial fulfillment for the degree of Doctor of Philos- 
ophy, done under the direction of Professor C. H. 
Lawshe. The original data for this study can be found 
in the author’s thesis on file in the Purdue University 
library. 

** The author is now serving as a Research Director, 


* Dunlap and Associates, Inc., Stamford, Connecticut. 


Criterion groups are defined as the group of indi- 
viduals scoring in the top given per cent of all scores 
(referred to as the “good” group) and the group scoring 
in the bottom given per cent (referred to as the “poor” 
group) on the basis of total score made on the test. 


Criterion of Evaluation: For the purpose of 
this study test reliability was selected as the 
criterion for the worth of a test. Specifically 
odd-even split-half test reliability was em- 
ployed. It is realized at the outset that this 
imposes certain restrictions upon the conclu- 
sions to be drawn. Reliability is by no means 
the only criterion which might be used; in fact 
it is recognizedly inferior to test validity. 
However, in many applied situations the test 
constructor has no way of evaluating test valid- 
ity until the test has been constructed; there- 
fore, he must rely upon some substitute cri- 
terion. Since validity is to some extent de- 
pendent upon reliability, the latter frequently 
serves as a substitute. 

There are many ways of measuring reliabil- 
ity. Again, however, in the applied situation 
the test constructor will usually have to use a 
single-trial estimate. Some evidence has been 
found by Gage and Damrin (7) that the split- 
half method, when used in tests similar to the 
one employed in this study, yields reliability 
coefficients which are very close to those ob- 
tained by other single-trial estimates. 


Review of the Literature 


Excellent reviews of the literature can be 
found in the writings of Long and Sandiford 
(13) and Adkins (1). Their experimental 
studies remain the classic ones in the field. 

Long and Sandiford used thirteen different 
methods of item analysis to select the best 100 
items from a 500 item vocabulary test. Scores 
made on the 100 item test were then correlated 
with those made on the 500 item test. The 
higher the correlation, the better the method in- 
volved was assumed to be. They concluded 
that many of the commonly used methods, in- 
cluding Bi-serial r and the Kelley Technique, 
differ very little in effectiveness. 

Adkins selected the best 15 items from a 150 
item test using ten different methods. Toops’ 
L-Method was considered the “ideal” one; 
therefore, results obtained from the other nine 
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were compared with it. Using scholastic in- 
dices as the criterion, she found the L-Method to 
give the most valid test; however, its superi- 
ority over some of the other methods was very 
small. When the same items were given to an- 
other group of subjects, the superior validity 
obtained by the L-Method disappeared com- 
pletely. Adkins, therefore, concluded that 
one of the simpler methods might well be used. 

The two studies most similar to the present 
one are those of Lawshe and Mayer (12) and 
Mason (14). In both studies 300 test items 
were analyzed using different methods; tests 
of varying lengths were selected, and these 
were evaluated in terms of split-half reliability. 
Lawshe and Mayer’s study indicated that for 
lengthy tests the use of D-Values gives a some- 
what more reliable test than does Davis’ 
Modification of Flanagan’s r. However, Ma- 
son’s study revealed no significant differences 
resulting from the use of D-Values and Phi- 
coefficients. 

In summarizing, the literature gives some 
justification for using a simple method of item 
analysis rather than a time-consuming one. 
However, the question of which of the various 
simple methods to use has remained unan- 
swered. 


Procedure 


Construction and Administration of Test: 
In order to avoid the common pitfall of using 
a previously validated test whose items are 
already known to be gbod, the author con- 
structed an entirely new test for this study. 
In its final form the test consisted of 150 multi- 


ple-choice vocabulary items. A preliminary 
try-out on 109 subjects showed that the test 
could be completed easily within one class 
period; therefore, it was considered to be a 
power test of ability. 

The test was administered to 1,240 psychol- 
ogy and education students at Purdue Uni- 
versity during the 1949-1950 academic year. 
Administrative conditions were standardized: 
only two individuals acted as test adminis- 
trators, and IBM answer sheets were used. 
Those few individuals not completing the test 
had their papers discarded. 

Selection of Methods of Item Analysis: An 
informal poll taken among experienced test 
constructors found them practically unanim- 
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ous in selecting three methods of item analysis 
for comparison. A fourth method was later 
added by the author. The four methods, with 
the number of significant digits obtained for 
each and the symbol representing each in the 
tables, are as follows: 


1. D-Values: as obtained from Lawshe’s 
nomograph (11) from the original work of 
Kelley (13); two significant digits, symbolized 
by me i ed 

2. Davis’ Modification of Flanagan’s r: as 
obtained from Davis’ original paper (4); two 
significant digits; symbolized by “r.”’ 

3. Phi-coefficients: as obtained from Jur- 
gensen’s tables (9) from the original work of 
Guilford (8); three significant digits; symbol- 
ized by “Phi.” 

4. Per Cent Method: as obtained by sub- 
tracting the per cent of the “poor” group 
passing the item from the per cent of the 
“good” group passing it; two significant digits; 
symbolized by “%.’’ This method was in- 
cluded, despite the fact that it in no way 
standardizes the scores, because it is the simpl- 
est method available and may be used as a 
base upon which to compare the other methods. 


Conditions under Which Methods Compared: 
Rather than compare the four methods on the 
basis of one test, many different comparisons 
were made by varying test length and size of 
criterion groups. For each method the origi- 
nal test was item analyzed six times, once for 
each of the following sized criterion groups: 
10%, 20%, 27%, 30%, 40%, 50%. Tests 
were then selected which comprised four differ- 
ent lengths: the best 20 items, 40 items, 60 
items and 80 items. Thus 24 different tests 
were selected for each method from the original 
150 item test, thereby making a grand total of 
96 different tests. 

The first 500 students taking the original 
test were used for the item analyses leading to 
the selection of these 96 tests. This group of 
subjects, hereafter referred to as the Primary 
Group, was assumed to be comparable to the 
remaining students, hereafter referred to as 
the Secondary Group, upon which reliability 
coefficients were later obtained. As is shown 
in Table 1, this assumption seems justifiable in 
that none of the means nor standard deviations 
differ significantly from group to group. 





' while comparisons. 
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Table 1 
Comparison of Different Groups Employed 





Standard 


Group N Deviation 





500 
183 
183 
183 
183 


12.54 
13.17 
13.37 
13.19 
13.02 


Primary 

Secondary No. 1 
Secondary No. 2 
Secondary No. 3 
Secondary No. 4 





Experimental Design Required for Optimal 
Number of Comparisons: Whenever two r’s are 
computed from the same sample, a correla- 
tional term is present and must be included in 
the computation of the standard error of the 
difference between the r’s. Since most reli- 
ability coefficients are high (in this study all 
exceeded .70), Fisher’s z’ transformation should 
be made when comparing them. However, the 
r between z’s is unknown; therefore, the stand- 
ard error of the difference between z’s of cor- 
related groups cannot be computed. It thus 
becomes necessary, in order not to under- 
estimate any significant differences, to com- 
pare r’s obtained from different samples when- 
ever possible. 

Ideally the reliability of each of the 96 tests 
used in this study should have been obtained 
from a different group. Since this was im- 
possible, a substitute design was used. By 
dividing the Secondary Group randomly into 
four equal groups and using a modified Latin 
Cube design, most worthwhile comparisons 
Between r’s could be made. 

, The experimental design is presented in 
Table 2. It differs from the Latin Cube in 
that the Secondary Groups are systematically 
rather than randomly placed in their cells. 
This permits a maximum number of worth- 
For example, the reli- 
ability coefficient of the 20 item test which 
has been selected by using D-Values with 10% 
sized criterion groups is measured on Second- 
ary Group No. 1. This r can be compared 
with all those obtained from any of the other 
three secondary groups, including: (1) all r’s 
obtained for 20 item tests selected by using 
the other three methods with 10% sized cri- 
terion groups; (2) all r’s obtained for 40 item, 
60 item and 80 item tests selected by using 


D-Values with 10% sized criterion groups; 
(3) all r’s obtained for the 20 item test selected 
by using D-Values with 20%, 30%, 40% and 
50% sized criterion groups. To generalize, 
this design permits all worthwhile comparisons 
to be made with the following two exceptions: 
(1) r’s computed from tests selected by using 
a certain method with 10% sized criterion 
groups cannot be compared with r’s computed 
from tests selected by using the same method 
with 27% sized criterion groups; (2) in the 


Table 2 


Experimental Design 
Note: The number in each cell represents the Second- 


~ ary Group used in the computation of the reliability 


coefficient involved. 





20 Item Test 





Method 
Phi 


Criterion 
Group Size 











40 Item Test 








60 Item Test 
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Table 3 
Reliability Coefficients Obtained Under Varying Conditions 





Test Criterion 





Method 





Length Group Size 





20 items 
20 items 
20 items 
20 items 
20 items 
20 items 
40 items 
40 items 
40 items 
40 items 
40 items 
40 items 
60 items 
60 items 
60 items 
60 items 
60 items 
60 items 
80 items 
80 items 
80 items 
80 items 
80 items 
80 items 


10% 
20% 
30% 
40% 
50% 


10% 
20% 
30% 
40% 
50% 


10% 
20% 
30% 
40% 
50% 


10% 
20% 
30% 
40% 
50% 
27% 


878 
.870 
875 
890 
896 
892 
879 
885 





same manner r’s between 20% and 50% sized 
criterion groups cannot be compared. 
Calculations of Reliability Coefficients} Test 
papers were scored on the IBM Graphic Item 
Counter. Since split-half reliability coeffici- 
ents were being used, each paper had to be 
scored twice for each reliability measurement, 
thus making a grand total of 35,126 scores. 
These scores were transferred to IBM coding 
cards, which were tabulated in order to obtain 
the final reliability coefficients. The results, 
after having been stepped up by the Spearman- 
Brown formula, are presented in Table 3. 


Results 

Application of Analysis of Variance among 
Reliability Coefficients: One way of handling data 
of the sort shown in Table 3 is to test the sig- 
nificance of the difference between all r’s taken 
two atatime. Not only is this time consum- 
ing; it also leaves one with an extremely diffi- 
cult problem of interpretation. However, the 


design of this experiment was such that it lent 
itself to an analysis of variance after certain 
modifications had been made. 

One assumption made in an analysis of vari- 
ance is that each variable is normally dis- 
tributed. It is known that distributions of 
r’s do not meet this assumption. This can be 
corrected by transforming each r into its 
equivalent z’, the distribution of which is 
known to be normal. Actually the statistics 
used for this study were z’s modified in order 
to facilitate arithmetical procedure. Each 
statistic (x) was entered into its cell as follows: 
x= 1,000 (z’— 1). This modification af- 
fects neither the normality of the distribution 
nor the distribution of total variance into its 
components, 

Three-Variable Analysis of Variance: Total 
variance was first analyzed into that attribut- 
able to methods of item analysis (Method), cri- 
terion group size (Size), test length (Length), 
and all interactions among these three. It was 
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realized that not having used independent 
groups in the computation of each z’ could re- 
sult in an underestimation of any significant 
differences. 

Error variance in such analyses is usually 
assumed to be equal to the second-order inter- 
action (Method X Size X Length). How- 
ever, in this analysis the true error variance is 
known: it is the standard error of z’. The 
number of degrees of freedom for this term be- 
comes infinite.” 

Table 4 shows the results of this analysis. 
Variance due to test length is the only one 
which is clearly significant, being beyond the 
0.5% levelofconfidence. This is to beexpected 
from previous knowledge of factors known to 
affect test reliability. There is some indication 
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data indicates that the latter cause is the more 
likely one. The standard error of z’ assumes 
that each z’ has been computed from an inde- 
pendent group. Use of the same groups over 
and over again should decrease the standard 
error. Furthermore, the two first-order inter- 
actions in which repetition of groups occurs 
most often (Method X Size and Size X Length) 
come out with smaller F-ratios than does the 
third first-order interaction; and the second- 
order interaction has much less variance than 
does the error variance. 

On the basis of the previous discussion the 
careful experimenter should be hesitant about 
drawing definite conclusions from Table 4. 
It is quite likely that error variance has been 
overestimated, thereby underestimating all 


Table 4 
Three-Variable Analysis of Variance 














Sum of 
Squares 


Mean 


Square Probability 





Method (M) 3 
Size (S) 5 
Length (L) 3 
MxsS 15 
MXL 9 
SxXL 15 
MXSXL 45 
Total 95 
Error* © 


40,548.28 
16,205.68 
1,546,574.03 
23,562.78 
32,796.51 
27,982.53 
171,257.18 
1,858,926.99 


13,516.09 
3,241.14 
515,524.68 
1,570.85 
3,644.06 
1,865.50 
3,805.72 
19,567.65 
5,555.56 


10>P>.05 
.75>P>.50 
.005>P 
P>.995 
-75>P>.50 
.995>P>.990 
95>P>.90 
.005>P 





1 


* Expected Error Variance in this case is (1000)? ¢,?, in which o,? = ——~ 


that methods of itm analysis might affect test 
reliability, their Fratio being significant be- 
tween the 5% and 10% levels of confidence. 
However, the variance among reliability co- 
efficients computed from different sized criterion 
groups could easily have occurred by chance. 

It is puzzling to note that two of the first- 
order interactions and the second-order inter- 
action are “significantly small” in variance, all 
being significant at less than the 90% level of 
confidence. Could this have occurred by 
chance, or has the estimate of error variance 
been too large? Further inspection of the 


? For this and all other statistical innovations used 
throughout this study the author is indebted to Mr. 
James Norton of the Division of Educational Reference, 
Purdue University. 
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F-ratios. Differences found to be significant 
will remain so; however, those approaching a 
given level might, if properly measured, reach 
it. This then suggests the advisability of 
making a more rigorous statistical analysis. 

One important question can be answered from 
the preceding analysis: is there any significant 
interaction among the three main variables? 
As will be noted from Table 4, the mean 
squares of the four interactions are so small 
that it is extremely unlikely that any are sig- 
nificant, regardless of how much overestima- 
tion of error variance has occurred. It may, 
therefore, be concluded at this point that there 
is no significant interaction present. Having 
drawn this conclusion it becomes possible to 
make the final analysis of the data. 
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Table 5 
Latin Cube Analysis of Variance 





Sum of 


Source Squares 


Mean 


Square Probability 





Method 

Size* 

Length 

Group 

Residual 51 
Total 63 


16,719.31 
7,346.81 
985,875.69 
3,408.19 
128,243.94 
1,141,593.94 


5,573.10 
2,448.94 
328,625.23 
1,136.06 
2,514.59 


AS>P>.10 
50>P>.25 
.005>P 

715>P>.50 





* Includes criterion groups size 10%, 20%, 30%, 40%. 


Latin Cube Analysis of Data: The Latin 
Cube, as originally mentioned by Fisher (6), 
is an expansion of the Latin Square design. It 
suffers from the same limitation of the Latin 
Square: it does not permit the measurement of 
any interactions. Each variable is, therefore, 
randomly distributed in order that all inter- 
actions will be equally distributed and will not 
affect the final analysis. 

As previously explained, in this study each 
variable was systematically distributed. This 
procedure did not remove any effects of inter- 
actions. However, the three-variable analysis 


of variance showed no significant interactions 
present, thereby making this limitation unim- 
portant. 

The Latin Cube design permits the analysis 
of four variables simultaneously. This en- 
abled another one to be included; viz., the 
group of subjects upon which reliability co-. 


efficients were computed (Group). Previously 
it had been assumed (from the data in Table 1) 
that the particular Secondary Group upon which 
reliability coefficients were being made would 
not etfect the size of the r’s. Inclusion of this 


variable in the analysis at this point permitted 
statistical acceptance or rejection of this as- 
sumption. 

In the Latin Cube design the number of 
cells in all arrays must be equal. For Length, 
Method and Group this presented no problem, 
there being four of each, However, two of the 
six different sized criterion groups had to be 
discarded. Since the Secondary Groups in the 
cells of the 10% sized criterion groups are 
identical with those in the 27% array (see 
Table 2), one of these two had to be discarded. 
The same reasoning holds true for the 20% and 
50% arrays. In order to make a sufficient 
number of comparisons two Latin Cube analy- 
ses were made, one including criterion group 
sizes 10%, 20%, 30%, and 40%; and the second 
including sizes 20%, 27%, 30%, and 40%. 
The results of the first analysis are shown in 
Table 5, those of the second in Table 6. 

It will be noted from Tables 5 and 6 that 
once again the variance due to test length is 
significant beyond the 0.5% level of confidence. 
And again the size of the criterion groups 
causes no significant variations among reli- 


Table 6 
Latin Cube Analysis of Variance 











Source Freedom Squares 


Mean 


Square Probability 





Method 3 
Size* 3 
Length 3 
Group 3 
Residual 51 
Total 63 


28,648.69 
11,354.81 
1,091,380.31 
11,733.69 
157,720.44 
1,300,827.94 


9,549.33 
3,784.94 
363,793.44 
3,911.23 
3,092.56 


.05>P>.025 
50>P>.25 
.005>P 
50>P>.25 





* Includes criterion groups size 20%, 27%, 30%, 40%. 
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ability coefficients. The assumption that the 
Secondary Group being employed does not 
affect test reliability is found tenable, the 
F-ratios for Groups being significant at less 
than the 25% level. Just as was found in the 
three-variable analysis of variance, both Latin 
Cube analyses indicate that methods of item 
analysis might affect test reliability. This 
finding has resulted in the final statistical 
analysis, described herewith. 

Comparison of Per Cent Method with all 
other Methods: The high F-ratios for the vari- 
ance between methods shown in Tables 5 and 
6, one significant at approximately the 10% 
level and the other beyond the 5% level, is 
taken as sufficient justification for analyzing 
this variance even further. One could com- 
pute critical ratios for the differences between 
r’s obtained by the various methods under 
each condition. Table 3 reveals the impracti- 
cality of this. Of the 24 different comparisons 


made among the four methods, the Per Cent 
Method has the lowest reliability coefficients 
75 per cent of the time and the highest reli- 
ability coefficients only 12} per cent of the 
time, while the other three methods are prac- 
tically equal in the number of times they ap- 


pear in each rank. For this reason, plus the 
fact that the Per Cent Method was originally 
introduced into the study as a base upon which 
to compare the other three methods, it was 
decided to compare reliability coefficients ob- 
tained by the Per Cent Method with those 
obtained by the other three. 
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Table 8 


Comparison of Average Observed r’s with 
Theoretical r’s Obtained by Spearman- 
Brown Formula 





Average 
Observed Theoretical 
Test Length r r* 





802 
863 
.880 
894 
883 


802 
890 
924 
942 
.968 


20 items 
40 items 
60 items 
80 items 
150 items 





* Indicates r’s computed from Spearman-Brown 
formula. 


Using the technique described by Cochran 
and Cox (3), the variance due to methods in 
both Tables 5 and 6 was broken down into two 
components: variance between the Per Cent 
Method and the other three methods (with one 
degree of freedom) and variance among the 
other three methods (with two degrees of 
freedom). The total of these two variances 
should and does equal that of all variance due 
to methods. As is shown in Table 7, when this 
analysis is made, both F-ratios comparing Per 
Cent Method with the other three are found 
to be significant beyond the 2% level of con- 
fidence. The variance among the other three 
methods does not approach statistical sig- 
nificance. 

Comparison of Observed Increases in Test 
Reliability with Those Predicted by Spearman- 


Table 7 
Further Analysis of Variance among Methods 





Sum of 
Squares 


Degrees of 
Freedom 


Source 


Mean 


Square Probability 





Between % 

and (D, r, Phi)* 
Among, D, 7, Phi* 
Totals* 
Residual* 
Between % 

and (D, r, Phi)** 
Among, D, r, Phi** 
Total** 
Residual** 


14,525.52 
2,193.79 
16,719.31 
128,243.94 


27,122.52 
1,526.17 
28,648.69 
157,720.44 


.025>P>.02 
.15>P>.50 


14,525.52 
1,096.90 


2,514.59 


27,122.52 
763.08 


3,092.56 





* Taken from original data appearing in Table 5. 
** Taken from original data appearing in Table 6. 
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Brown Formula: The average reliability co- 
efficient for each of the four different length 
tests was computed in the recommended man- 
ner (5, 16). Then, taking the average r for 
the 20 item test as the base, the r’s to be ex- 
pected for the 40 item, 60 item, 80 item and 
150 item tests according to the Spearman- 
Brown formula were computed. The result- 
ing comparisons, as shown in Table 8, reveal 
that test reliability increases as test length in- 
creases from 20 to 80 items. However, the in- 
crease in reliability is not so great as that pre- 
dicted by the Spearman-Brown formula. 
Furthermore, the average 80 item test is 
actuaily found to be more reliable than the 150 
item test. 
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to measure the amount of over-lapping among 
items selected by various methods. Tests 
selected from only two different sized criterion 
groups were compared. The 27% sized cri- 
terion group, which gave maximum variability 
among reliability coefficients, and the 30% 
sized criterion groups, which gave minimum 
variability, were those used. The results of 
these comparisons are presented in Table 9. 
In acutality the data shown in Table 9 are an 
underestimation of the true amount of over- 
lapping because chance factors entered some- 
what into the selection of items. In selecting 
the best 20 items, for example, it sometimes 
happened that there was no dividing line be- 
tween the 20th and 21st item. The 18th, 


Table 9 
‘Per Cent of Item Over-lapping between Various Methods 





Methods Being Compared 
D-% r-Phi 


65% 75% 
63% 73% 
70% 87% 
74% 83% 
6% 75% 
63% 73% 
68% 85% 
75% 84% 
68% 79% 





Criterion 
Group Si- 


Test 
Length D-+r 
wS% 
95% 
97% 
98% 
95% 
93% 
98% 
96% 
96% 


D-Phi 


75% 
73% 
87% 
83% 
75% 
68% 
83% 
84% 
79% 


r-% 


65% 
65% 
687% 
74% 
65% 
657% 
70% 
78% 
69% 


Phi-% 


90% 
9% 
91% 
1% 
90% 
93% 
85% 
907% 
90% 


Average 


78% 
77% 
83% 
84% 
78% 
76% 
82% 
85% 
807% 





27%, 
271% 
271% 
27% 
30% 
30% 
30% 
30% 


20 items 
40 items 
60 items 
80 items 
20 items 
40 items 
60 items 
80 items 
Average 





There is a logical explanation for the results 
shown in Table 8. The Spearman-Brown 
formula assumes that the lengthened test is in 
all ways equivalent to the shorter test; there- 
fore, those items being added to the test must 
be as good as those already within the test. 
This assumption has not been met in this 


study. The 20 item tests consist of the best 
20 items, the 40 item tests of the best 20 items 
plus the next-best 20, etc. Because the 
lengthy tests are not equivalent to the shorter 
ones, the Spearman-Brown formula will over- 
estimate the reliability of the former. It even 
becomes possible (as happened in the 150 item 
test) to add too many poor items and actually 
decrease test reliability. 

Over-lapping of Items Selected by Various 
Methods: A secondary purpose of this study was 


19th, 20th, 21st and 22nd items might all have 
had the same discrimination index. In order 
to select three out of these five, a table of 
random numbers was used. The more often 
this situation arose, the less the amount of 
over-lapping among items became. Because 
Phi-coefficients were computed to three sig- 
nificant figures (the other methods were com- 
puted to two significant figures) this situation 
arose least often in making comparisons in- 
volving this method. 

The per cent of over-lapping of items among 
the methods is very high. On the average 80 
per cent of the items selected by one method 
have been selected by the other methods under 
the same conditions. Over-lapping is lowest 
between the Per Cent Method and D-Values 
and between the Per Cent Method and Davis’ 











202 


r; it is greatest between D-Values and Davis’ 
r (averaging 96 per cent). Since the per cent 
of over-lapping appearing in Table 9 is an 
underestimation, the true percentage between 
D-Values and Davis’ r actually approaches 
unity. For practical purposes one can con- 
clude that the two methods are selecting identi- 
cal items. 

, There is a slight tendency for the per cent 
of over-lapping to increase as test length in- 
creases. However, there appears to be no 
difference between the amount of over-lapping 
of items selected by using the various methods 
with 27% sized criterion groups and that of 
items selected by using the same methods with 
30% sized criterion groups. 


Conclusions 


On the basis of the data analyzed in this 
study the following conclusions have been 
drawn: 


1. The hypothesis that reliability coeffici- 
ents do not differ significantly as test length 
increases is rejected beyond the 1% level of 
confidence. Test length does improve test 
reliability provided good items are being added. 
Adding items poorer than those already in- 
cluded in the test will fail to improve test reli- 
ability to the extent predicted by the Spearman- 
Brown formula. Addition of too many poor 
items can decrease reliability. 

2. The hypothesis that reliability coeffici- 
ents do not differ significantly as the size of the 
criterion group varies is found to be tenable. 
There is no evidence that the size of the cri- 
terion groyp affects test reliability, none of the 
F-ratios for Size being significant at the 25% 
level. This throws serious question upon the 
suggestion made by Kelley (10) and followed 
by many test constructors that 27% is the best 
criterion group size. 

3. The hypothesis that reliability coeffici- 
ents do not differ significantly as the method of 
item analysis varies is rejected at the 2% level 
of confidence. All three standardized methods 
of item analysis (D’s, r’s and Phi’s) yield tests 
of comparable reliability; however, each yields 
tests of significantly higher reliability than does 
the Per Cent Method. 

4. The hypothesis that reliability coefficients 
do not differ significantly because of any inter- 
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actions among the three main variables is 
found to be tenable. No method yields a con- 
sistently more reliable test than do the others 
when the test is a certain length or when a cer- 
tain sized criterion group is used. No con- 
sistent relationship exists between test length 
and size of criterion group. 

5. The per cent of over-lapping of items 
selected by the various methods is very high, 
averaging 80 per cent. There is some tend- 
ency for the per cent of over-lapping to in- 
crease as test length increases. In all situ- 
ations items selected by D-Values and Davis’ 
r are practically identical. 


Summary 


A 150 item vocabulary test was item analy- 
zed into 96 sub-tests, using four different 
methods of item analysis, four different length 
tests, six different sized criterion groups. 
Split-half reliability coefficients were obtained 
for each of the 96 sub-tests and were compared 
in order to evaluate the importance upon test 
reliability of test length, method of item analy- 
sis, and size of criterion group being used. The 
conclusions were that increasing test length 
raises test reliability so long as good items are 
being added, that D-Values, Davis’ r and Phi- 
coefficients yield tests of comparable reliability 
but all are superior to the Per Cent Method, 
that most of the items selected by any one 
method will be selected by the others, and that 
size of criterion group does not affect test reli- 
ability. 

The author realizes that the conclusions can- 
not be generalized to apply to all test con- 
struction situations. A single test was used 
for this study, the criterion for evaluation was 
not the most ideal one, there were relatively 
few methods being compared, and these were 
compared under a limited number of condi- 
tions. Nevertheless, it is felt that this study 
gives some justification for the use of D-Values, 
Davis’ r and Phi-coefficients in future test 
construction regardless of the length of the 
test or the size of the criterion groups employed. 
All appear to be equally good; therefore, the 
test constructor would do well to choose his 
method on the bases of accessibility and sim- 
plicity. As to the customary usage of the 
27% sized criterion group, serious doubt is 
raised as to whether this or any other sized 
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criterion group is always the most desirable 
one to use. 


Received September 15, 1950. 
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Note on Ely’s “Effects of Various Methods upon Test Reliability” 


C. E. Jurgensen 
Minneapolis Gas Company 


Ely’s article Studies in Item Analysis 2: 
Effects of Various Methods upon Test Reliability, 
which appears in this issue is interesting, in- 
formative, and valuable to psychologists who 
work in the field of item analysis. The author 
is to be commended for an excellent research 
study in an area which needs and deserves fur- 
ther research. However, there are certain im- 
plications in Ely’s article which seem to require 
clarification. 

On the basis of data in Table 3, Ely states 
“Of the 24 different comparisons made among 
the four methods, the Per Cent Method has the 
lowest reliability coefficients 75 per cent of the 
time and the highest reliability coefficients 
only 12} per cent of the time, while the other 
three methods are practically equal in the 
number of times they appear in each rank.” 
The reader should note that if the highest rank 
is considered 1, the lowest possible rank is 4. 
Ranks are often difficult to interpret, especi- 
ally when N = 4 and the assumption of nor- 
mality of distribution cannot be entertained. 
In this case it seems important to consider the 
extent of differences as well as the rank. Ely’s 
Table 3 gives twenty-four correlations for each 
of the four methods of item analysis which he 
investigated. The median coefficient for each 
of these methods gives one indication of the 
extent of difference between these four meth- 
ods. Median coefficients are as follows: 

D = .876; r = .£878; Phi = .864; and % = 
864. 

With respect to median correlations, it ap- 
pears that the Phi and Per Cent methods of item 
analysis give equally reliable tests and that 
these methods give median reliability coeffici- 
ents which are only .012 and .014 less than those 

obtained by the D and r methods. 
_ Ely’s third conclusion reads: “The hypothe- 
sis that reliability coefficients do not differ sig- 
nificantly as the method of item analysis varies 
is rejected at the 2% level of confidence. All 
three standardized methods of item analysis 
(D’s, r’s and Phi’s) yield tests of comparable 
reliability; however, each yields tests of sig- 
nificantly higher reliability than does the Per 
Cent Method.” If the reader interprets, as 
he should, the second sentence as statistically 


“significantly higher reliability,” the sentence 
is true. If he interprets “significantly higher” 
in any way other than statistically, it is ques- 
tionable to say the least. In the summary Ely 
states “. . . that D-Values, Davis’ r and Phi- 
coefficients yield tests of comparable reliability 
but all are superior to the Per Cent Method. 
.” Again, the interpretation of “superior” 

must be limited to statistically superior. It is 
more than doubtful that a median increase in 
reliability of .014 or less indicates superiority 
in any other than a statistical sense. 

Interpreters of data must constantly remind 
themselves that any difference will be “sig- 
nificant” or “superior” if the N is sufficiently 
large. Statistical significance is quite differ- 
ent from practical significance. Although 
practical significance requires statistical sig- 
nificance, many differences are statistically 
significant but have no practical importance 
whatsoever. The difference between statisti- 
cal and practical significance does not appear 
to have been given sufficient emphasis in texts. 

In the middle ages, some philosophers dis- 
puted with great solemnity in regard to the 
number of angels that could stand on the head 
of a pin. In the middle of the twentieth 
century some statisticians are engaged in what 
amounts to almost the same kind of disputation 
in which microscopic differences are accepted 

s “important” if they meet some arbitrary 

criterion of significance at the 1 per cent, the 
2 per cent, or the 5 per cent level. In other 
words, there is “much ado about. nothing.” 

The above does not imply that Ely’s article 
is unimportant. On the contrary, the article 
appears to have considerable value. However, 
the value does not lie in the demonstration 
of statistical superiority of the D, r, and Phi 
methods of item analysis to the Per Cent 
method. Rather, its values is in the finding 
that (within the limits of this experiment) the 
simple and easily obtained per cent difference 
will be equally as effective in similar practical 
situations as the more difficult and harder to 
obtain D, r, and Phi methods. 


Received January 3, 1951. 
Published out of turn by the editor. 
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Fatigue in House Care 


Irma H. Gross 
Department of Home Management and Child Development, Michigan State College 


and 
S. Howard Bartley 
Department of Psychology, Michigan State College 


Many commonsense experiences assure us 
that the essential bodily symptoms of fatigue 
are frequently induced not by work but by the 
mere contemplation of it, even though the 
formal outlook on fatigue imputes fatigue to be 
the result of work or activity. Certain recent 
students! of the subject have come to the con- 
clusion that fatigue is an expression of certain 
forms of disorganization with reference to 
activity, rather than of energy lack. They 
put fatigue in a category parallel to anxiety 
and other distinguishable personality mani- 
festations. Fatigue, therefore, is to be dis- 
tinguished from impairment (a metabolic 
matter), and is to be studied from the indi- 
vidual’s relation toa task. Fatigue should not 
be identified with work output, and therefore, 
work decrement is to be no measure of it. 

The study of fatigue from the worker’s 
standpoint is considerably more difficult than 
the measurement of physical conditions ex- 
ternal to him, or the measurements of iso- 
lated physiological processes. This difficulty 
has certainly been a factor in retarding the 
shift of attention of investigators from the 
more usual kind of fatigue studies to the kind 
implied by the personalistic viewpoint. It is 
time, however, for this shift to occur, even 
though the first efforts be far from as penetrat- 
ing and precise as could be desired. The pres- 
ent report represents a beginning in this 
direction and should be looked upon in this 
light. 

During the study of work simplification of 
certain household tasks, the Department of 
Home Management at Michigan State College 
became interested in fatigue. It was quite 
easy to demonstrate reduction in time through 
improvement of work methods, but the ques- 


1 Bartley, S. Howard, and Chute, Eloise. Fatigue 
and impairment in man. New York: McGraw-Hill, 
1947. 


tion arose as to whether fatigue was reduced by 
this reduction in time. A project was there- 
fore proposed to compare energy output of 
persons carrying out the same sequence of 
house care activities by improved methods 
This project was never undertaken because, in 
developing the plans for it, it was soon ap- 
parent that the energistic study was not called 
for. Energy output from relatively few house- 
hold tasks is sufficient to cause any high degree 
of fatigue as conventionally defined. 

The study about to be reported involved an 
inquiry into the occurrence of fatigue in weekly 
house cleaning. The study aimed at deter- 


‘mining what the tasks meant to those perform-_ 


ing them, and what relations these attitudes 
bore to the fatigue produced. Fatigue in this 
study is not defined conventionally, but is con- 
sidered to be a personal state of aversion with 
reference to activity involving a feeling of 
bodily discomfort and inadequacy. 

The suppositions underlying the work were: 
(1) fatigue is a personalistic affair and must be 
studied from the same standpoint as other per- 4 
sonalistic states such as anxiety rather than ~ 
as an inadequacy through tissue impairment ; t 
(2) the fatigue state bears no consistent rela- ve 
tion to the amount of energy expended; (3) 
the amount of fatigue bears a positive relation 
to the general dislike for the task at hand; and ~ 
(4) the individuals manifesting fatigue are — 
more likely those whose behavior also exhibits ; 
disorder. 

Subjects. The subjects were a relatively 4 
homogeneous group. All had to: (1) have 
lived in their then present address for at least a 
year; (2) be the mother of at least one school 
child; (3) have been married at least 10 years; 
(4) have not reached the menopause; (5) have 
considered themselves in their usual state of 
health; (6) have not been keeping roomers or 
boarders; and (7) be in the habit of doing their 
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own cleaning. To obtain a fair cross section 
of women in the town of about 10,000, a random 
sampling on a block basis was planned. The 
drawing of subjects from larger units was 
necessitated, however, in some cases in order 
to meet the criteria just mentioned. 

Procedure. The same trained observer made 
all the observations. The interviewing was 
accomplished in three steps: (1) preliminary 
contact to determine whether the subject met 
the specifications and was willing to cooperate 
in the project; (2) a second contact of approxi- 
mately two hours or more during which the 
observer made a Process-Chart-Man Analysis 
of the subject’s customary cleaning activities 
and questioned her on her development of 
fatigue; and (3) another contact during which 
the procedures of the second contact were re- 
peated concerning the development of fatigue 
at different stages of the cleaning activities. 

The observations and questions of the inter- 
viewer were all aimed at determining: (1) the 
amount of motor activity involved in the sub- 
ject’s cleaning; (2) the amount and quality of 
cleaning accomplished; (3) the degree of order- 
liness or disorderliness of the subject’s be- 
havior; and (4) the reported presence or 
absence of fatigue. 

No attempt is made here to discuss the find- 
ings of the present study as statistical data 
because of the small number of cases (20). 
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Nevertheless a short presentation of the gen- 
eral findings are in order. 


Results 


The interviewer made a chart for each sub- 
ject on each of the two work periods, recording 
fatigue as a function of time. See Figure 1. 
Of the 20 women, six reached a state of “great” 
fatigue at one or more points during the two 
periods.. These subjects are shown as the first 
six cases in the figure (case numbers in circles). 
Five of the 20 women indicated that they did 
not get tired during their two periods of clean- 
ing, or that they reached only the state of 
“little” fatigue. These five are pictured as the 
last cases in the figure (case numbers in 
squares). The gaps in the lines indicate 
pauses, either to rest, or as a result of inter- 
ruption, or to perform some unrelated task 
such as telephoning. 

The graphs indicate that even in the cases 
in which extreme fatigue developed, it some- 
times reached its peak part way through the 
work period and then subsided without a rest 
period to account for it. This is in opposition 
to the concept that fatigue is an energistic 
affair. In some cases pauses resulted in lessen- 
ing fatigue and in others, they did not. 

For a number of subjects, although the 
cleaning activities were not necessarily identi- 
cal, the developmental pattern of fatigue was 
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Development of fatigue during course of about a 2-hour cleaning period on two occasions. Solid line, 
Numbers on abscissa, time in minutes. Letters on ordinate, degrees of 


fatigue (G = great; M = moderate; L = little; N = no). Number in upper left corner indicates subject. 
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Table 1 








Great Fatigue 





Work 


Rooms quality 











Moderate Fatigue 





Work 


Rooms quality 





3 
5 
4 
3 
5 
3 
7 
3 
4 





4.1 
3to7 





Little or No Fatigue 





Work 


Rooms quality 





10 
18 

8 
15 


1 
1 18 





13.6 
8 to 18 


Avg. 2.6 
Range 1lto4 





much alike in the two work periods. The 
similarity between the two suggests that the 
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feelings of fatigue in the cleaning periods from 
week to week tend to become somewhat fixed 
in pattern, involving a similarity in the house- 
wife’s attitude toward her tasks, and involving 
the same appraisal of the effects of the cleaning 
tasks upon her. Some women expect to get 
very tired; other do not. 

The following table indicates some relations 
between fatigue, number of rooms cleaned, and 
quality of work. The interviewer rated each 
woman on a scale of 18, in which the proper use 
of equipment counted 3; removal of dirt, 12; 
and reconditioning of furniture, 3. The count 
given removal of dirt was subdivided into four 
categories of 3 each. 

Of the 20 women, those who became most 
tired cleaned, on the average, fewer rooms than 
those who became moderately tired, but more 
rooms than those who became only slightly 
tired, or not tired at all. The work quality of 
those who became most tired was poorer, on 
the average, than that of both those who be- 
came moderately tired, only slightly tired, or 
not tired at all. 


Conclusions 


The positive relations between fatigue and 
other factors that were suggested by the find- 
ings were those involving the subjects’ atti- 
tudes. Those who became bewildered by the 
clutter; those who found making decisions 
difficult regarding what to do next; and those 
who had a general background of distaste for 
work, were the individuals who became tired. 
There was,as was to be expected, the common 
lack of distinction in the minds of the subjects 
between localized muscle discomfort from 
stooping, etc., and the over-all personal experi- 
mental state we identify as fatigue. Enough 
work was done in the study to convince the 
investigators that it is practical to study fatigue 
from the non-energistic standpoint. 


Received August 14, 1950. 








Influence of Inertia in Making Settings on a Linear Scale * 


William Leroy Jenkins, Louis O. Maas, and Merritt W. Olson 
Lehigh University 


Two previous studies (1, 2) have investigated 
the influence of a number of variables in mak- 
ing settings by a control knob on a linear scale. 
The most significant variable appears to be the 
ratio between pointer-movement and knob- 
turn. A ratio of one or two inches of pointer- 
movement for one complete turn of the knob 
seems to be optimal under a wide variety of 
conditions. As to the other variables studied: 
Knob diameter proves to be relatively unim- 
portant, as long as the knob is of a size that 
can be grasped conveniently. Backlash is 
annoying to the operator, but has no practical 
influence on his performance as long as the 
optimal ratio is used. Friction is detrimen- 
tal, especially in making adjustments that re- 
quire large amounts of travel. The present 
study is concerned with the effects of inertia, 
both in its own right and in interaction with 
other variables. 


Apparatus and Procedure 


The apparatus and procedure in the current 
investigation were essentially the same as de- 
scribed for the previous studies (1,2). The 
subject matches the position of a lighted insert 
in a black bakelité scale with a pointer con- 
trolled by a rotary knob. The error tolerance 
permitted is determined by the width of the 
pointer in relation to the width of the lighted 
insert. 


By means of two chronoscopes, time is 
measured separately for travel to the approxi- 
mate location and for making the final adjust- 


ment. Similarly, action potentials from the 
active forearm are accumulated and measured 
separately during travel and during final ad- 
justment. Mean travel time is computed for 
two standard distances: 10 sixteenths and 50 
sixteenths of an inch. Mean total time in 
tenth seconds is then computed as mean travel 
time plus mean adjusting time. Analogous 


* This research was executed under Contract No. 
W33-038-ac-22561 between the Institute of Research, 
es University, and the USAF Air Materiel Com- 
mand, Aero Medical Laboratory, Wright-Patterson 
Air Force Base, Dayton, Ohio. 


computations are made for action potentials in 
terms of meter-scale readings. 

Inertia was added by attaching lead fly- 
wheels to the main control shaft which is 
turned by the subject’s knob. Each flywheel 
weighs 23 oz. and is 4” in diameter. One fly- 
wheel gives just noticeable inertia. Two fly- 
wheels provide quite marked inertia. Four 
flywheels supply enough to make the pointer 
travel clear across the scale following a single 
sharp twist of the knob. 

Friction was added by means of a Prony 
brake system as previously described (2). 
Unless otherwise stated, the 2?” knob and a 
ratio of 1.18 inches of pointer movement for 
one complete turn of the knob were used 
throughout. Except for the final set of experi- 
ments the error tolerance was .007”. 

Eight different subjects, all young men, were 
used in the various experiments. Two of 
them (BRR and DMS) had taken part in the 
earlier studies. 

Results 


Influence of Inertia at Optimal Ratios. 
Figure 1 shows the effect of adding one, two 
and four flywheels to provide inertia. The 
open symbols indicate adjusting time and po- 
tential, while the solid symbols show total time 
and potential for a travel distance of 50 six- 
teenths. For clarity, the results of one sub- 
ject (JEA) are plotted separately. Two 
ratios (1.18 and 2.42) were employed but are 
not separately indicated in the figures. 

It is evident that added inertia has no prac- 
tical effect on performance, either on adjusting 
time or on total time for 50 sixteenths travel. 
Nor is there any indication of a consistent 
influence of added inertia upon action poten- 
tials. In general, subjects expressed a prefer- 
ence for the inertia provided by one flywheel. 

Interaction of Inertia and Backlash. It was 
previously reported (1) that backlash has little 
influence on performance as long as the opti- 
mal ratio is employed. The suggestion has 


1 Tabulated data corresponding to the figures can be 
found in AF Technical Report No. 6038. 
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been advanced that the addition of inertia 
might alter this finding. 

Figure 2 shows the results of an experiment 
to check this hypothesis. The open symbols 
indicate adjusting time and potential; the solid 
symbols show total time and potential for a 
travel distance of 50 sixteenths. For clarity, 
the results of one subject (JEA) are plotted 
separately. The sets of symbols at the left of 
the vertical boxes give results without added 
inertia. Except for a minor increase for sub- 
ject JEA, these confirm the earlier findings with 
other subjects: That is, backlash has little or 
no influence on either time or potentials. 

The symbols within the vertical boxes show 
the results with the added inertia of four fly- 
wheels. The general effect of inertia-plus- 
backlash is to increase slightly the adjusting 
time and potential (open symbols). This in- 











SJEA VTJG 


Influence of inertia. 


crease is then reflected in a similar raising of 
total time and potential. As will be seen, the 
overall effect is relatively minor. 

Interaction of Inertia and -Friction. The 
detrimental effects of friction have been pre- 
viously reported (2). To what extent can 
these be overcome by deliberately adding 
inertia to the system? 

Figure 3 shows the results of the interaction 
of the inertia of four flywheels with varying 
amounts of friction. The experiments were 
actually performed in two sets: 0, 200, 500 
friction; 0, 500, and 1,000 friction. The fric- 
tion units are grams pull required at the periph- 
ery of the 23” knob, as previously described 
(2). The open symbols indicate adjusting 
time and potential; the solid symbols show total 
time and potential for a travel distance of 50 
sixteenths. 
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The sets of symbols at the left of the vertical 
boxes give results without added inertia. These 
confirm the earlier findings: Friction is definitely 
deleterious. The symbols within the vertical 
boxes show the results with the added inertia of 
four flywheels. The general effect of added 
inertia is to decrease somewhat the harmful in- 
fluence of friction. Throughout there is a 
consistently lower time and potential with 
added inertia. The effects are confined to 
travel time and potential, adjusting time and 
potential being little affected by such changes. 

Inertia-Plus-Friction and Ratios. The next 
experiments arose from the suggestion that a 
much coarser ratio than one or two inches of 
pointer movement might prove optimal under 
conditions of added inertia-plus-friction. Fig- 
ure 4 gives the results of an experiment de- 
signed to check this hypothesis. The inter- 
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Interaction of inertia and backlash. 


pretation of the symbols is similar to that in 
the previous figures. Added inertia was four 
flywheels; added friction was 500. 

The experimental findings negate the hy- 
pothesis. Even for a travel distance of 50 
sixteenths, neither total time nor total. po- 
tential becomes lower at ratio 6.28 than at 
ratio 2.42, which is normally optimal for long 
travel distances. It will be noted that the 
effects of inertia-plus-friction show up much 
more markedly in potential than in time. 

Inertia-Plus-Friction and Knobs, Normal 
Tolerance. It has been suggested that knob 
diameter might become an important factor 
under conditions of added inertia-plus-friction. 
even though it is not under normal conditions 
(1). Figure 5 shows the results of an experi- 
ment involving six knob diameters under 
normal conditions and with added inertia 
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(four flywheels) plus friction (500). The conditions (1). That is, there is no consistent 
interpretation of the symbols is similar to that change in adjusting time or in total time with 
in previous figures. change in knob diameter. On the other hand, 

The symbols to the left of the vertical boxes potentials are somewhat less with the larger 
confirm the previous findings under normal _ sizes. 


RATIOS 
6.26 1.18 2.42 


— 





v 
7 

mis 
oI OR 


aR 


<j ee 


— 
v 
B 


% 






































* 
—_ 


TIME POTENTIAL 

















ORJD OBRR DMS YWJRW 
ADDED INERTIA-PLUS-FRICTION 


Fic. 4. Inertia-plus-friction and ratios. 





William Leroy Jenkins, Lowis O. Maas, and Merritt W. Olson 
KNOB DIAMETER 
ci 3 ee 13 @ 2% 23 


k if 





ll] | | 
a la iy (4 


3B) |B ® 
° 



























































ja] La} Pla] ee 


POTENTIAL 




















TIME 





* ADDED INERTIA-PLUS-FRICTION 








ORJO QBRR 


Fic. 5. Inertia-plus-friction and 
knobs, normal tolerance. 


40mMS VuJRW 


KNOB OItAMETER 
3 4 


1} 1h 2 





nM Fe 
























































le} lal Ld ded de Ae * 


TIME 





























POTENTIAL 








%*ADDED INERTIA-PLUS-FRICTION Swse YEwD 


Fic. 6. Inertia-plus-friction and 
knobs, severe tolerance. 


DMS 





> 


Influence of Inertia in Making Settings on a Linear Scale 


With added inertia-plus-friction, adjusting 
time and total time are lengthened more with 
the smaller knob sizes (1}”’ and 1}’’) than with 
the larger. The effects are even more marked 
with adjusting potential and total potential. 
With added inertia-plus-friction, the larger 
knob sizes appear to be distinctly more favor- 
able when large travel distances are involved. 

Inertia-Plus-Friction and Knobs, Severe Tol- 
erance. Requiring a more severe tolerance 
throws more emphasis on adjustment and 
tends to increase adjusting time and potential. 
Figure 6 shows the results of experiments in- 
volving six knob diameter with the severe 
tolerance limit of .003”. This tolerance ap- 
proaches the limit of the subjects’ perceptual 
ability to tell when they have a correct setting. 

Although the general level of adjusting times 
and adjusting potentials is raised throughout, 
there is certainly no greater effect of knob di- 
ameter than at a tolerance limit of .007”, as 
far as total times and total potentials are con- 
cerned. 

Summary 


For matching settings on a linear scale, 
added inertia as such has little or no practical 
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effect on performance. Inertia in combination 
with backlash tends slightly to increase ad- 
justing time, but the overall effect is relatively 
minor. When heavy friction is present, add- 
ing inertia tends to compensate for the deleteri- 
ous effects of the friction to some extent. 
Combined inertia-plus-friction does not change 
the optimal ratio of one or two inches of 
pointer movement for one complete’ turn of 
the knob. With combined inertia-plus-fric- 
tion present, the larger knob diameters are 
somewhat superior to diameters smaller than 
two inches, both with normal tolerance and 
with severe tolerance requirements. 


Received August 14, 1950. 
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Book Reviews 


Fraser, John Munro. A handbook of employ- 
ment interviewing. London: MacDonald and 
Evans, 1950. Pp. 212. 8/6d. 

In spite of an apparent unfamiliarity with 
the work on the patterned interview and allied 
procedures as selection techniques which has 
been done in this country (at least no reference 
is made to it), Mr. Fraser has developed an 
approach which parallels it very closely. He 
recognizes the need for having comprehensive 


job descriptions and specifications as a basis © 


for selection. Likewise, he is aware of the con- 
tribution which a detailed application form 
can make and recommends a suitable one. 
He is aware of the parts played by intelligence 
and aptitude tests but does not elaborate their 
contributions. He is less aware of the impor- 
tance of preliminary screening standards and 
makes no allowance for reference checks of any 
kind, either written or by telephone. Neither 


does he suggest the recording of the interview 
findings in any formal and organized manner. 
(Apparently his interview is based principally 
on a somewhat informal review of the data on 
the application form.) The interviewer guides 


the general course of the conversation, but 
gives the applicant a great deal of latitude in 
determining the content of the discussion. 

Mr. Fraser is cognizant of the need to pro- 
vide an appropriate setting for the interview. 
He stresses the importance of a friendly initial 
reception for the applicant and the parts to be 
played by courtesy and friendliness in his hand- 
ling before and during the interview. He rec- 
ognizes the dangers of the “cross examination” 
situation in which the applicant is not en- 
couraged to volunteer material about himself. 

The author regards (a) Physique (health 
and strength, outward appearance and man- 
ner, physical energy); (b) Attainments (gen- 
eral education, specialized training, work ex- 
perience); (c) General intelligence (the cap- 
acity for complex and intricate mental work); 
(d) Special aptitudes (the pre-dispositon to 
acquire certain types of skill); (e) Interests 
(liking for social, intellectual, practical-con- 
structive or physically-active work); (f) Dis- 
position (the ability to undertake a role which 
involves steadiness or reliability, acceptability 
to, or influence over others); and (g) Circum- 


stances (or the levels of expectation which the 
job will satisfy) as important in the evaluation 
of the qualifications of an applicant for a par- 
ticular opening. All of these are commonly 
regarded as important by workers on this 
problem in this country. On the other hand, 
while he does make reference to “physique,” 
it is less in the sense of health than of body 
build and those physical attributes which are 
important in “doing the job.” He does not 
place as great emphasis on “health” as distinct 
from “physique” as is common here. 

The greatest difference (and limitation) be- 
tween his procedure and those most commonly 
used in this country is that he presents no 
systematic and organized set of interpretive 
concepts to be used in analyzing the applicant 
and determining his suitability for a particular 
opening. In common with the thinking here, 
he postulates that an individual’s behavior con- 
stitutes the best guide to his qualifications and 
that modes of action, once established, tend to 
repeat themselves and can, therefore, be used 
as a basis for predicting what he may be ex- 
pected to do. Likewise, he recognizes the in- 
fluence of the family environment in childhood 
as a factor. He also makes use of the concept 
of “interests” as a measure of the applicant’s 
“‘motivations.”” However, he has no concepts 
comparable to those of “‘character’’ or specific 
“habit patterns” or of “emotional immatur- 
ity.” Furthermore, he makes no systematic 
effort to evaluate the influence of the appli- 
cant’s domestic situation. He makes a rather 
naive reference to Freud “as concerned wholly 
with sex” which indicates his unfamiliarity 
with the latter’s work. Beyond this, he ap- 
pears to be largely unfamiliar with the psychi- 
atric concepts which apply in this field. 

To summarize, the book is simply and inter- 
estingly written. The principles it enunciates 
are sound. For a beginner in the field it will 
have much value because its approach is strictly 
“common sense.” Its greatest limitation is 
that it offers little that is new, i.e., it contributes 
no strikingly great advances in concept or 
technology. Probably its greatest contribu- 
tion is the picture which it provides of the 
current state of personnel thinking in Great 
Britain where it is apparent that there is still 
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considerable resistance to the assembly of facts 
about an applicant prior to-his employment. 
This conviction that such inquiries are an in- 
vasion of the individual’s personal privacy is 
neatly illustrated by the author’s quotation 
from a recent Spectator article: ‘The Post- 
Master General is curious. Before he will let 
you help deliver the Christmas mail, he re- 
quires full details of your past, your health and 
your mother’s birthplace.” 
Robert N. McMurry 


Robert N. McMurry and Cc., 
Chicago, Illinois 


Dale, Edgar. (Ed.) Readability. Chicago: 
National Council of Teachers of English, 
1949. Pp. 44. $.60 single copy. $.50 
ten or more. 

Here is a distinct one-case validation of the 
slogan that good things come in small packages. 

Readability is a collection of five articles by 
five research specialists which originally ap- 
peared in the journal Elementary English from 
January to May, 1949. Each article concerns 
itself with a particular segment of the complex 
problem of making writing readable, and each 
is concise, critical, and informative. 

This collection is not a mere historical survey 
of the problem, although the background sec- 
tions and bibliographies are excellent. Neither 
is it a simple recital of current readability ac- 
complishments. There is a strong emphasis 
on methodology throughout. The article by 
Harold Burtt on typography and readability 
contains an admirable summary of experi- 
mental designs and criteria in this field. 

The scope of the collection is wide enough to 
include consideration of such important read- 
ability aids as audience interests, audience 
background information, conceptual difficulty 
of material, and typography as well as the 
comprehension measures to which most read- 
ability formulas are restricted. 

To the reviewer, a most impressive thing 
about these articles is the thorough analysis 
they make of areas where current readability 
knowledge is thin. Edgar Dale and Jeanne 
Chall in the opening article discuss the vari- 
ables which must be taken into account in a 
comprehensive appraisal of readability. They 
make it plain that despite marked progress, 
much remains to be done. 
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Irving Lorge in his article evaluating read- 
ability formulae critically analyzes the criteria 
upon which many formulae are based. No 
researcher, after reading Lorge’s article, will be 
tempted to accept any readability measure at 
its face value. 

E. W. Dolch presents a very complete ac- 
count of the use and misuse of vocabulary lists 
in assessing readability. His discussion of the 
Thorndike, Dale, Dale-Chall, Gates, Lorge and 
Rinsland lists gives the reader a thorough 
orientation in this division of readability re- 
search. 

In a final article, Dale and Chall outline a 
number of techniques for writers to use in 
striving to produce a readable product. 

One omission in the collection is the lack of 
any attention to the recent successful applica- 
tion of readability research in personnel work, 
industry and advertising. This is probably 
explained by the authors practicing their own 
doctrine. The typical subscriber to Elementary 
English is more concerned with the level of 
children’s textbooks than with the pulling 
power of advertising copy. 

‘Such publications as this one, it seems to the 
reviewer, augur well for the continued progress 
of readability research. 


Robert L. Jones 
University of Minnesota 


Tredgold, R. F. Human relations in modern 
industry. New York: International Uni- 
versities Press, Inc., 1950. Pp. 192. $2.50. 
The title of this volume is somewhat similar 

to several used by American psychologists pre- 

viously. However, there is little similarity be- 
tween the contents of this book and the other 
textbooks, which are more definitive and usually 
make their points with more authority. In- 
deed, it would be fairer to consider this a com- 
pilation of essays. Each chapter was a basis 
of a lecture in courses on ‘Human Relations in 

Industry” at Roffey Park Rehabilitation 

Centre (London) held in 1947-48. In place of 

the usual bibliographies, reference material, 

and footnotes, the author provides the reader 
with appropriate quotations of the great 
philosophers and essayists of the past and pres- 
ent. The tenor of this little book is modest 
with appropriate common sense notions and 
reflects the author’s self-critical attitude 
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throughout the lecture series, which he suc- 
cessfully carried over into the published text. 

The author’s dedication reflects the spirit 
mirrored in the remainder of the volume: “To 
all those persons of sound mind who are inter- 
ested in the welfare of their fellow workers.” 
The author goes on to relate: “As a matter of 
historical interest, it may be noted that I have 
ventured to paraphrase the dedication of my 
father’s book, Mental Deficiency, first published 
in 1908 which read, ‘To all those persons of 
sound mind interested in the welfare of their 
less fortunate fellow creatures.’ ” 

The book outlines the psychologist’s con- 
tribution to problems of leadership, incentives, 
personnel management, absenteeism, hours of 
work and leisure, mental first-aid, etc. The 
lecture series were provided to patients who 
were at the Centre for rehabilitation for 
physical as well as psychological difficulties. 
The composition of the group therapeutic 
sessions varied, but contained on several oc- 
casions persons in industrial relations work in 
unionism, private industry, and in government 
service. The orientation for many of these 


essays is prevention and mental hygiene with a 
clinical approach to industrial problems rather 


than the application of psychological tests, 
surveys and other applied psychological stud- 
ies. Didactic lectures were avoided and, in- 
stead, the technique of “situation discussion” 
as adapted from Harvard was introduced to 
these various British groups. “The first 
groups were encouraged to express their own 
needs by something resembling a leaderless 
group technique . . . the unstructured nature 
of the discussions was exceedingly exciting and 
led to significant information for the leader as 
well as for the student.” The most striking 
feature of the courses was the ease with which 
the members got to know each other and to see 
each other’s points of view, which, indirectly, 
was the goal and the purpose for these lectures 
in the first place. 

This volume is recommended to American 
audiences who wish to supplement their more 
technical libraries of books on “applied psy- 
chology” with a truly humanistic approach to 
industrial situations. The author is not a psy- 
chologist, but Regional Psychiatrist to the 
S. E. Metropolitan Hospital; Assistant Physi- 
cian in the Department of Psychological Medi- 
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cine, University College Hospital; Boots 
Lecturer at the Roffey Park Rehabilitation 
Centre; and Psychiatrist to the Maudsley 
Hospital, all in London. The author is ex- 
ceedingly well informed on psychological issues 
and displays an humble attitude which, un- 
fortunately, is all too often lacking in persons 
working in industrial settings, but which is 
frequently a necessary adjunct to employee- 
management relationships. The reader who 
wishes “to think for himself” will be rewarded 
by some of the provocative thoughts the author 
offers. Furthermore, he avoids specific solu- 
tions to problems which do not permit clear- 
cut, “cast-iron” conclusions. 


Arthur Weider 
School of Medicine, 
University of Louisville 


Otis, Jay L., and Leukart, Richard H. Job 
evaluation: A basis for sound wage adminis- 
tration. New York: Prentice-Hall, 1948. 
Pp. xv + 473. $6.65. 

The purpose of this book is to provide for 
the fields of education and industry an organ- 
ized presentation of the essentials of wage and 
salary administration based upon job evalu- 
ation—thus to permit the executive, the union 
leader, and the student to concentrate on the 
basic essentials and achieve an understanding 
of each aspect. 

The reviewer agrees with the general ration- 
ale of this book. Job analysis, description, 
and specification are presented as sine qua non 
in sound job evaluation, and the latter as a 
means to effective wage and salary administra- 
tion rather than as an end in itself. It is re- 
freshing to find a more comprehensive treat- 
ment of job analysis and its products and of 
the pitfalls of rating procedure than custom- 
arily appears in books on this subject. The 
organization of the materials, and the clarity 
and completeness of the presentation, however, 
leave something to be desired. 

After an introductory chapter there follow 
three chapters on job evaluation systems which 
take up the ranking, the grade-description, 
the point, and the factor-comparison methods; 
next is a chapter on planning a job evaluation 
program, which is followed by discussions of 
job analysis, descriptions, and specifications. 
The foregoing occupy a little over half the 
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volume. The last eight chapters deal with 
job rating scales, rating verification, employee 
classification, the wage curve, establishing the 
classification and pay structure, job evaluation 
and collective bargaining, establishing the 
wage and salary administration system, and 
its use as a management control. 

Despite this book’s many good points certain 
criticisms are in order. First may be men- 
tioned the over-all chapter sequence. It is 
unfortunate that the unprepared reader is 
plunged almost immediately into a welter of 
job evaluation systems and their labels. This 
reviewer believes that the readers to whom this 
volume is addressed should first be informed 
about job composition and the techniques and 
vagaries of job classification; next, the extent 
to which job analysis and its products have 
aided in the solution of some of these problems; 
they then may be in a position to understand 
why job evaluation systems have arisen and 
the problems these systems attempt to solve. 
They should be spared a multiplicity of labels 
in the text which could advantageously be 
keyed into footnotes. 

Allied to the problem of chapter sequence is 
the organization of particular chapters relative 
to others. Scattered throughout the volume 
are materials which might profitably be in- 
corporated into sections on techniques of 
analysis, description, and classification; then, 
too, the reader might be spared considerable 
repetition. One example is the frequently 
recurring subject of job titles, which is even 
given a separate sub-heading in the final 
chapter. Another is the repeated listing of 
certain standards applicable to job descriptions 
and specifications which incidentally are 
equally pertinent to the job analysis schedule. 

The practical problem of choosing a job 
evaluation method for a given company de- 
serves comment. Although the authors do 
provide lists of advantages and disadvantages 
of given systems, they do not specify differ- 
ential circumstances indicative for a given 
selection or of those favoring the development 
of a tailor-made system. The latter, though 
mentioned, receives no organized treatment. 
Chapter Five, “Planning a Job Evaluation 
Program,” indicates that a system is chosen 
prior to the analysis of the company’s jobs, 
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which under some circumstances would be a 
most undesirable procedure. 

Other criticisms, such as variation in level of 
detail, are of a minor nature and should not be 
included here. Most certainly, however, the 
merits of Job Evaluation outweigh its weak- 
nesses. 


Alan M. Kershner 


University of Maryland 


Williamson, E. G., and Foley, J.D. Counsel- 
ing and discipline. New York: McGraw- 
Hill Book Co., 1949. Pp. xii + 387. $3.75. 
Personnel psychologists, both educational 

and industrial, have long been in the habit of 
expecting to find helpful ideas, data, and tools 
in the publications of their professional col- 
leagues working at the University of Minne- 
sota. The publication of a book on Counseling 
and Discipline by two Minnesota staff mem- 
bers commands attention not only because of 
the fact that most of us have been thinking of 
counseling as being anything bu discipline, but 
also because the authors can be expected to 
have something worth while to say on the sub- 
ject. 

In 1941 Williamson, as Minnesota’s Dean of 
Students, was asked to take responsibility for 
the administration of student discipline. Ap- 
parently because of the generally accepted idea 
that counseling and discipline do not mix well, 
special counselors were appointed to handle 
disciplinary problems. More important still, 
the research tradition prevailed and proce- 
dures were set up for the collection and analy- 
sis of data on disciplinary problems. Herein 
lies one of the most important contributions of 
Williamson and Foley’s book, for it is appar- 
ently the first detailed study of college dis- 
ciplinary problems and treatment. 

The book begins with a discussion of student 
misbehavior as an outgrowth of student life, 
thus placing discipline in its proper context as 
an integral part of student personnel work and 
showing that other aspects of personnel work 
are concerned with discipline in a preventive 
way. This is followed by a treatment of be- 
havior theory as related to discipline, em- 
phasizing that behavior and misbehavior are 
learned. Disciplinary counseling is viewed as 
aiming to remove the causes of misbehavior 
and to help the student learn adaptive modes of 
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behavior. The administration of disciplinary 
counseling at Minnesota is then described in 
detail. 

Chapters IV and V describe, classify, and 
tabulate the frequency of student disciplinary 
problems. The classification is in terms of 
behavioral similarities, despite the desire of 
some investigators to develop classifications 
based on underlying dynamics: as the authors 
point out, the latter type of classification is 
difficult both to develop and to use, and the 
behavioral classification is useful because it 
describes the observed problem. This ma- 
terial should be helpful to persons administer- 
ing discipline in other institutions, because of 
the perspective which classification and tabula- 
tion can give on problems. Chapter VI, 
“Methods of Investigation,”’ impresses this re- 
viewer as one of the weakest, because it deals 
with both methods and dynamics in insufficient 
detail to be helpful to the student. Perhaps 
this book is not the place for detailed treatment 
of methods of investigation: if so, it should 
have been more explicitly recognized. The 
authors do state that the book is not intended 
as a detailed treatment of the dynamics of mis- 
behavior: less reference could therefore have 
been made to the work of Healy and Bronner, 
the almost exclusive source on delinquency, 
and it is to be hoped that Minnesota data will 
be analyzed to throw more light on the dy- 
namics of student misbehavior. 

The final chapter, entitled “Counseling as 
Rehabilitation,” is an excellent discussion of a 
constructive approach to discipline. The ma- 
jor hypothesis as formulated by the authors 
well summarizes it: “In disciplinary situations, 
the counseling process helps the individual to 
face and gain insight into the consequences of 
his delinquent behavior, aids him in under- 
standing the motivations and behavioral pat- 
terns which underlie his special conflict, and 
assists him in acquiring that personal growth 
and integration which facilitates the develop- 
ment of a more socially satisfactory and per- 
sonally satisfying personality structure. In 
this sense, the counseling process promotes and 
effects rehabilitation and is, in its own right, a 
rehabilitation process. Within its inherent 
limitations, disciplinary counseling is rehabili- 
tation.” 

A 158 page Appendix consists of case ma- 
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terials. Reading these with a psychothera- 
peutic orientation, as the reviewer did despite 
the authors’ stress on the fact that disciplinary 
counseling is not identical with therapy, is 
likely to leave one with an unfavorable impres- 
sion: the superficial behavioral problem seems 
too often to have been dealt with, rather than 
the underlying personality problem. But 
then one must remember that the objective of 
disciplinary counseling is to teach more adap- 
tive modes of behavior, and when that can be 
done, as it seems to have been done, without 
the time and expense of therapy, the method is 
justified. When intensive treatment of per- 
sonality problems seemed called for, attempts 
were made to help the student obtain it. 

In conclusion, this is not a book that coun- 
selors will find helpful if they do not encounter 
problems involving discipline. But if they do, 
they will find that it does a good deal both to 
clarify the nature of the distinction between 
therapy and discipline, and to show how coun- 
seling can contribute to the handling of dis- 
cipline. 


Donald E. Super 


Teachers College, 
Columbia University 


Froehlich, Clifford P. Guidance services in 
smaller schools. New York: McGraw-Hill 
Book Company, Inc., 1950. Pp. viii + 352. 
$3.75. 

The flow of new titles in the stream of “guid- 
ance” literature has long since reached such 
volume as to present a formidable challenge 
even to the venturesome educator. The com- 
posite of currents, counter-currents, eddies, 
undertows, and just plain still water, can prom- 
ise only a\rough and uneven course to those who 
would pilot the guidance program craft or man 
its crew. . 

Froehlich’s “Guidance Services in Smaller 
Schools” is a welcome addition to the flow be- 
cause of its strength, unity, and consistency of 
direction. He sets out to give substance to 
his thesis that an effective guidance program is 
possible in a small school. His aim is reached 
through the clarity of his definition and identi- 
fication of the guidance services, and even 
more through the presentation of numerous 
described practices in widely scattered schools. 
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It is the latter feature of the book which 
makes it unique. And, while the practices are 
not always in perfect agreement with the de- 
fined guidance services, yet those presented are 
carefully selected and are in harmony with one 
of Froehlich’s basic assumptions, that the guid- 
ance services cannot be superimposed upon a 
school but must become, through a process of 
gradual growth, an integral part of the school’s 
program. Moreover, the inclusion of the 
descriptions of actual practices tends to close 
the traditional gap between idea and action. 

The book will meet many of the needs of 
three kinds of users—those who are carrying on 
identified guidance activities, those who are 
engaged in organizing and administering guid- 
ance programs, and those who are engaged in 
the professional preparation of guidance work- 
ers. However, it is not intended as a complete 
compendium; selected references are listed at 
the end of each chapter. 

This reviewer wonders how many others will 
be both intrigued and somewhat provoked by 
the title of the book. The author studiously 
avoids defining the “smaller school.” One 
wonders if this is due to an awareness that 
factors other than mere enrollment are criteria 
of “smallness.” Also, if the title was chosen 
through a shrewd calculation of the problem 
of distribution, it was an adroit selection. 
What are the relative proportions of “smaller” 
and “larger” schools in the country? It is the 
reviewer’s feeling that the size of a school is 
not especially important in shaping the de- 
velopment of its guidance program; and that 
essential differences in practice due to size are 
relatively few. Both “larger” and “smaller” 


schools will find geod use for Froehlich’s con- ! 


tribution to the guidance literature. 
Fred M. Fowler 
Department of Public Instruction, ’ 
The State of Utah 


Goodenough, Florence L. Mental Testing: 


Its history, principles, and applications. 
New York: Rinehart & Co., Inc., 1949. 
Pp. 609. $5.00. 

Mental testing as treated in this book is 
broadly defined, to include the measurement of 
achievement and personality characteristics 
in addition to intelligence. In its concern with 
general mental life there is evidence through- 
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out that the handling of history and applica- 
tions is subordinated to the greater interest in 
scientific principles and methods. Of the 
pages in the book, about 95 are devoted to the 
historical orientation, about 200 to principles 
and methods, including much statistical theory, 
about 150 pages to tests and scales, and about 
185 pages are devoted to applications. The 
title of the book does correctly indicate its 
content, since history, principles, and applica- 
tions are woven together in a natural way in 
the treatment of topics throughout the book. 
At the end one finds a twenty-six page glossary 
of technical terms, twenty-one pages of bibli- 
ography, and thirteen pages of author and sub- 
ject indexes. 

The book is intended for students entering 
the field of testing, and for professional workers 
in schools and institutions who have occasion 
to use tests and test results. There is material 
of interest to clinical students, psychometrists, § 
school psychologists, school principals and — 
teachers, social workers, psychiatrists, pedi- ~ 
atricians, judges in juvenile courts, employ- © 
ment experts, and school counselors. 

The treatment, of course, touches upon a — 
great many topics which are not dealt with ex- 
haustively. For example, the person inter- 
ested in uses of tests with juvenile delinquents 
will not find all the material on juvenile de- 
linquency, but he will find what he needs to 
know about testing, and some good source 
materials to get him started correctly in his 
field of special interest. 

The professional workers listed above, who 
are not primarily psychologists, will find here 
a readable introductior and explanation of 
psychological testing tecliniques, including such 
newer devices as projective techniques. Stu- 
dents of psychology, even relatively advanced 
ones, will find the reading refreshing as well as 
instructive. 

Goodenough’s Mental Testing is a scientific 
book. It bristles with footnotes which give 
further information on technical points, and 
the text throughout is packed with information 
essential to a well-organized discussion of psy- 
chological testing. Rarely does one find a 
book which offers so much instructive material 
for both elementary and advanced students. 

Few persons could have written such a text. 
Not only does it show the results of years of 
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careful study, but it has an individual tone and 
flavor, being written by a well-trained psychol- 
ogist, with a special interest in technical and 
theoretical prcblems of measurement, and a 
wealth of experience with children. The re- 
sult is originality of treatment of topics, 
abundance of illustrative material, and correct- 
ness of viewpoint on technical and theoretical 
matters. The book is the scholarly general 
treatment of mental testing which has been 
needed for a long time. It makes a rare and 
valuable contribution to psychological litera- 
ture. 
Harold D. Carter 
University of California 
at Berkeley 


Levinson, Horace C. The science of chance. 
New York: Rinehart, 1950. Pp.348. $2.00 
(paper back). 

This book, written by a mathematician, 
carries the subtitle “From Probability to Statis- 
tics,” and is divided into two parts: Chance 
and Statistics. The first part (200 pages) is 
a very readable nontechnical presentation of 
the concepts of probability and their bearing 
on such games of chance as poker, roulette, 
craps, and bridge. The reviewer unreservedly 
recommends this part to two classes of readers: 
those who need a knowledge of probability in 
order to appreciate its applications in scientific 
research and those who have a vocational or 
avocational interest in games of chance. 

Although the author seems to think he has 
made “‘clear the nature of the modern science 
of statistics” (p. 270), the second part fails by 
far to reach this goal. An exposition of statis- 
tics which utilizes the antiquated “probable 
error,” which discusses briefly the normal, 
Bernoullian, Poisson, and Lexis distributions, 
which fails to mention ¢, chi square, and F, and 
which never comes to grips with even elemental 
significance tests and confidence limits, can 
hardly be characterized as “modern.” In- 
deed, it can be said that this book contains 
nothing on statistical techniques that couldn’t 
have been said 30 years ago. 

To illustrate the level of statistical reasoning, 
let us consider an example (pp. 304-306) which 
is concerned with whether a new style “ad” 
which pulled 250 responses is better than an 
old one which pulled 200 responses. We are 
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told that our decision must be made on the 
basis of information at hand—‘“the necessity 
for immediate action . . . prevents repetition” 
of the experiment. The author says that “such 
a problem is solved in practice” by imagining 
that we have a hundred repetitions of the old 
ad, that these repetitions yield a frequency 
distribution of number of responses with a 
(supposed) mean of 218, and that by comput- 
ing the standard deviation we can determine 
the probability of obtaining the 250 responses 
yielded by the new ad. The reader is not 
given a rule for pulling a distribution with 
mean of 218 from an imaginative storehouse. 
By this devious route the author arrives at a 
P of .18, whereas the unmentioned chi square 
technique, based entirely on the available 
information, yields a P of .02 for as large a 
difference between the old and new. 

Incidentally, the P of .18 is interpreted as 
“4 chances out of 5” that the new is better 
than the old, whence “no one would hesitate 
to” adopt the new since the “betting odds are 
altogether in favor of adopting it.” This is a 
sad confusion of the gambler’s odds with the 
level of significance usually prescribed by 
statisticians as a basis for making either a 
scientific or a practical decision. 


Quinn McNemar 
Stanford University 


Bales, Robert F. Interaction process analysis. 
Cambridge, Mass.: Addison-Wesley Press, 
1950. Pp. xi+ 203. $6.00. 

This book presents a method of observing 
and analyzing the interpersonal behavior of 
small groups concerned with problem solving. 
It involves essentially the direct observation of 
persons in a group and the classification of all 
observed behavior into one of a set of twelve 
categories. The set of categories includes be- 
havior of four types: questions, answers, 
negative reactions, and positive reactions. As 
each behavior unit is expressed, e.g. a request 
for information, the observer records a symbol 
for it and an indication of the person who spoke, 
and to whom. As an aid in this process, a 
box-like apparatus is used which holds a list of 
the categories and an electrically driven tape 
on which the symbols are written. The tape 
can later be compared with a sound recording 
made of the discussion. 
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The theoretical framework on which the 
categorization of behavior rests was developed 
as research progressed, and involves a number 
of points of view and assumptions. Among 
them is the conception of an idealized problem- 
solving sequence as beginning with emotionally 
neutral questions, and ending with emotion- 
ally positive or negative responses. Others 
which seem most important involve a con- 
ception of equilibrium, or tension-reduction, 
among group members; the assumption that 
every act contains cognitive, affective, and 
conative elements; and certain individual ob- 
jectives of each person in a group. 

Observers’ reliability is appraised by com- 
paring two or more records of the same dis- 
cussion, using the Chi Square test to determine 
the significance of differences. The test is 
made, ordinarily, for each category, with the 
average number of scores taken as the expected 
frequency, and the number of scores for an 
individual taken as the observed frequency. 

No general conclusion regarding the validity 
of the method is proposed, but ways in which 
data obtained can be analyzed and interpreted 
are presented, largely through the use of 
graphs and tables. 

The task which the author took upon him- 
self in writing this book was probably no less 
complex and difficult than the problem—ob- 
jectifying and quantifying behavior—itself. 
He is to be commended for his attempt, but 
not, however, for his presentation. Excursions 
down theoretical byroads which lead nowhere 
are frequent. As one example, the relationship 
the author attempts to establish between his 
concepts and those of psychoanalysis (p. 46) 


221 


is extremely tenuous and apparently irrelevant. 
In addition, much of the material is difficult to 
read, and unnecessarily so. Sentences with 
over sixty words are commonplace; some have 
more than one hundred. 

The reliability design, which involves a 
sequence of six phases and a number of tests 
is illustrated by a char? which itself presents an 
interesting challenge to the reader in the way 
of interpretation. Even when the design and 
results are understood, conclusions regarding 
reliability cannot be drawn because of the way 
in which Chi Square is used. For example, 
when the expected frequency contributes one- 
half of the variance of the observed frequency, 
as in the example cited of two observers, a limit 
is imposed on the Chi Square value obtained. 
Furthermore, there is no indication that the 
scores recorded are of the same behavior. Al- 
though Observer A and Observer B both re- 
corded five scores under category 10 during a 
twelve minute recording (Table 2), there is no 
assurance that they were observing the same 
behavior when they did so. 

Interaction process analysis is applicable to a 
wide variety of groups and inter-personal situ- 
ations where an objective method of behavior 
analysis is badly needed. Because of the pres- 
entation, however, the use of the method here 
will probably be limited—even among other 
researchers in the field, to whom the book is 
addressed. This is regrettable because the 
author has provided at least a foundation for 
further development, and possibly considerable — 
superstructure. 

Wesley Osterberg 

Prudential Insurance Co. 
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